
45 papers found
When Do We Not Need Larger Vision Models?
Lecture notes in computer science202411 citations
Real-world humanoid locomotion with reinforcement learning
Science Robotics2024117 citations
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
arXiv (Cornell University)20243 citations
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
arXiv (Cornell University)20243 citations
Risks and Opportunities of Open-Source Generative AI
arXiv (Cornell University)20246 citations
PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor
arXiv (Cornell University)20233 citations
Learning and Verification of Task Structure in Instructional Videos
arXiv (Cornell University)20234 citations
Self-correcting LLM-controlled Diffusion Models
arXiv (Cornell University)20233 citations
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
arXiv (Cornell University)202323 citations
Hierarchical Open-vocabulary Universal Image Segmentation
arXiv (Cornell University)20239 citations
Aligning Large Multimodal Models with Factually Augmented RLHF
arXiv (Cornell University)202312 citations
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
arXiv (Cornell University)202333 citations
Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation
arXiv (Cornell University)202314 citations
QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence2023102 citations
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
arXiv (Cornell University)202320 citations
Large Language Models are Visual Reasoning Coordinators
arXiv (Cornell University)202314 citations
Fast Image-based Neural Relighting with Translucency-Reflection Modeling
arXiv (Cornell University)20234 citations
TOAST: Transfer Learning via Attention Steering
arXiv (Cornell University)20235 citations
Compositional Chain-of-Thought Prompting for Large Multimodal Models
arXiv (Cornell University)20234 citations