
Matei Zaharia
researcherco-founder at Databricks
USA
Data / AI Infra. co-founder at Databricks.
30 papers found
Identification of cardiac wall motion abnormalities in diverse populations by deep learning of the electrocardiogram
npj Digital Medicine20255 citations
Text2SQL is Not Enough: Unifying AI and Databases with TAG
arXiv (Cornell University)20243 citations
How Is ChatGPT’s Behavior Changing Over Time?
Harvard Data Science Review2024245 citations
Adaptive and Robust Query Execution for Lakehouses at Scale
Proceedings of the VLDB Endowment20249 citations
Data Management for ML-Based Analytics and Beyond
ACM / IMS Journal of Data Science20243 citations
RAFT: Adapting Language Model to Domain Specific RAG
arXiv (Cornell University)202426 citations
Image and data mining in reticular chemistry powered by GPT-4V
Digital Discovery202450 citations
Semantic Operators: A Declarative Model for Rich, AI-based Data Processing
arXiv (Cornell University)20249 citations
Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
arXiv (Cornell University)20245 citations
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
arXiv (Cornell University)20243 citations
World Model on Million-Length Video And Language With Blockwise RingAttention
arXiv (Cornell University)202411 citations
Optimizing LLM Queries in Relational Data Analytics Workloads
arXiv (Cornell University)20246 citations
Long Context RAG Performance of Large Language Models
arXiv (Cornell University)20245 citations
ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data
Proceedings of the ACM on Management of Data202428 citations
ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data
arXiv (Cornell University)20243 citations
How is ChatGPT's behavior changing over time?
arXiv (Cornell University)2023162 citations
R <sup>3</sup> : Record-Replay-Retroaction for Database-Backed Applications
Proceedings of the VLDB Endowment20238 citations
Accelerating Aggregation Queries on Unstructured Streams of Data
Proceedings of the VLDB Endowment20235 citations
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
arXiv (Cornell University)202349 citations
Data Acquisition: A New Frontier in Data-centric AI
arXiv (Cornell University)20233 citations
Epoxy: ACID Transactions across Diverse Data Stores
Proceedings of the VLDB Endowment202315 citations
Ring Attention with Blockwise Transformers for Near-Infinite Context
arXiv (Cornell University)202311 citations
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
arXiv (Cornell University)202346 citations
Zelda: Video Analytics using Vision-Language Models
arXiv (Cornell University)20234 citations
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
arXiv (Cornell University)20234 citations