2024年7月, Meta AI 推出了 Llama 3.1 405B,这是第一个公开可用的模型,在常识、可操纵性、数学、工具使用和多语言翻译等最先进的能力方面可与顶级 AI 模型相媲美。作为最新版本的一部分,他们推出了 8B 和 70B 模型的升级版。它们是多语言的,具有明显更长的 128K 上下文长度、最先进的工具使用和整体更强大的推理能力。
@misc{zhou2024dbgpthub,
title={DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models},
author={Fan Zhou and Siqiao Xue and Danrui Qi and Wenhui Shi and Wang Zhao and Ganglin Wei and Hongyang Zhang and Caigai Jiang and Gangwei Jiang and Zhixuan Chu and Faqiang Chen},
year={2024},
eprint={2406.11434},
archivePrefix={arXiv},
primaryClass={id='cs.DB' full_name='Databases' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers database management, datamining, and data processing. Roughly includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1.'}
}
Awesome Text2SQL🎉🎉🎉
English | 中文版 | 论文
这里收集了针对大型语言模型、Text2SQL、Text2DSL、 Text2API、 Text2Vis 等的精选教程和资源。
🌱 如何贡献
我们热烈欢迎大家的贡献,无论您是发现拼写错误、错误、有建议,还是想要分享与LLM+Text2SQL相关的资源。有关如何贡献的详细指南,请参阅我们的 CONTRIBUTING.md 文件。
🔔 排行榜
Exact Match(EM)
Exact Execution(EX)
Reward-based Valid Efficiency Score (R-VES)
Execution Accuracy (EX)
(2021/05-SeaD+Execution-Guided Decoding)
(2023/11-MiniSeek)
(2023/11-MiniSeek)
(2024/08-OpenSearch-SQL, v2 + GPT-4o)
(2024/09-CHASE-SQL + Gemini)
(2021/03-SDSQL+Execution-Guided Decoding)
(2022/09-Graphix-3B + PICARD)
(2023/08-DAIL-SQL + GPT-4 + Self-Consistency)
(2024/08-ExSL + granite-34b-code)
(2024/09-AskData + GPT-4o)
(2020/11-IE-SQL+Execution-Guided Decoding)
(2022/09-CatSQL + GraPPa)
(2023/08-DAIL-SQL + GPT-4)
(2024/09-CHASE-SQL + Gemini)
(2024/08-OpenSearch-SQL, v2 + GPT-4o)
(2020/03-HydraNet+Execution-Guided Decoding)
(2022/09-SHiP + PICARD)
(2023/10-DPG-SQL + GPT-4 + Self-Correction)
(2024/07-Distillery + GPT-4o)
(2024/07-Distillery + GPT-4o)
(2020/12-BRIDGE+Execution-Guided Decoding)
(2022/05-G³R + LGESQL + ELECTRA)
(2023/04-DIN-SQL + GPT-4)
(2024/09-AskData + GPT-4o)
(2024/08-ExSL + granite-34b-code)
(2019/08-X-SQL+Execution-Guided Decoding)
(2022/08-RESDSQL+T5-1.1-lm100k-xl)
(2023/07-Hindsight Chain of Thought with GPT-4)
(2024/08-Insights AI)
(2024/08-Insights AI)
(2021/03-SDSQL)
(2022/05-T5-SR)
(2023/06-C3 + ChatGPT + Zero-Shot)
(2024/05-ExSL + granite-20b-code)
(2024/07-PURPLE + RED + GPT-4o)
(2020/12-BRIDGE)
(2022/12-N-best List Rerankers + PICARD)
(2023/07-Hindsight Chain of Thought with GPT-4 and Instructions)
(2024/07-RECAP + Gemini)
(2024/07-RECAP + Gemini)
(2021/04-Text2SQLGen + EG)
(2021/09-S²SQL + ELECTRA )
(2023/02-RESDSQL-3B + NatSQ)
(2024/07-PURPLE + RED + GPT-4o)
(2024/07-ByteBrain)
(2020/11-SeqGenSQL+EG)
(2023/02-RESDSQL-3B + NatSQL)
(2022/11-SeaD + PQL)
(2024/08-Arcwise + GPT-4o)
(2024/05-ExSL + granite-20b-code)
📜 目录
👋 简介
📖 综述
(2025-TKDE, CCF-A) Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL [paper] [code]
(2024-arXiv) From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems [paper]
(2024-arXiv) Large Language Model Enhanced Text-to-SQL Generation: A Survey [paper]
(2024-arXiv) A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? [paper] [code]
(2024-arXiv) A Survey on Employing Large Language Models for Text-to-SQL Tasks [paper]
(2023-VLDB, CCF-A)A survey on deep learning approaches for text-to-SQL [paper]
(2022-TKDE, CCF-A) A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [paper]
(2022-COLOING, CCF-B) Recent Advances in Text-to-SQL: A Survey of What We Have and What We Expect [paper]
(2022-arXiv)Deep Learning Driven Natural Languages Text to SQL Query Conversion: A Survey [paper]
💬 经典模型
(2025-NAACL, CCF-B) You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL [paper]

(2025-EDBT, CCF-B) DBCᴏᴘɪʟᴏᴛ: Natural Language Querying over Massive Databases via Schema Routing [paper] [code]

(2024-arXiv, None) CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL [paper]

(2024-arXiv, None) E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [paper] [code]
(2024-arXiv, None) Distillery: The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models [paper]
(2024-arXiv, None) DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models [paper] [code]

(2024-arXiv, None) SuperSQL: The Dawn of Natural Language to SQL: Are We Fully Ready? [paper] [code]

(2024-arXiv, None) CHESS: Contextual Harnessing for Efficient SQL Synthesis [paper] [code]

(2023-arXiv, None) MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL [paper] [code]

(2023-arXiv, None) Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [paper] [code]

(2023-AAAI 2023, CCF-A) RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL [paper] [code]

(2023-arXiv, None) Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs [paper] [code]
(2023-arXiv, None) DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction [paper] [code]

(2023-arXiv, None) A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability [paper] [code]

(2023-ICLR, CCF-A) Binding Language Models in Symbolic Languages [paper] [code]

(2023-SIGMOD, CCF-A) Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning [paper] [code]

(2023-ICASSP, CCF-B) T5-SR: A Unified Seq-to-Seq Decoding Strategy for Semantic Parsing [paper]

(2022-ACL, CCF-A) S2SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers [paper]

(2022-NAACL, CCF-B) SeaD: End-to-end Text-to-SQL Generation with Schema-aware Denoising [paper]

(2022-EMNLP, CCF-B) STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing [paper] [code]

(2022-EMNLP, CCF-B) RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL [paper] [code]

(2022-EMNLP, CCF-B) CQR-SQL: Conversational Question Reformulation Enhanced Context-Dependent Text-to-SQL Parsers [paper]

(2022-ACL, CCF-A) HIE-SQL: History Information Enhanced Network for Context-Dependent Text-to-SQL Semantic Parsing [paper]

(2022-arXiv, None) Importance of Synthesizing High-quality Data for Text-to-SQL Parsing [paper]
(2021-ACL, CCF-A) Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL [paper]

(2021-arXiv, None) Pay More Attention to History: A Context Modelling Strategy for Conversational Text-to-SQL [paper] [code]
(2021-ICLR, CCF-A) SCORE: Pre-training for Context Representation in Conversational Semantic Parsing [paper]

(2021-DASFAA, CCF-B) An Interactive NL2SQL Approach with Reuse Strategy [paper]
(2021-NAACL, CCF-B) Structure-Grounded Pretraining for Text-to-SQL [paper]

(2021-EMNLP, CCF-B) PICARD:Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models [paper] [code]

(2021-ICLR, CCF-A) GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [paper] [code]

(2021-ACL, CCF-A) LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations [paper] [code]
(2020-EMNLP, CCF-B) Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing [paper] [code]

(2020-ACL, CCF-A) TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [paper] [code]

(2020-ACL, CCF-A) RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers [paper] [code]

(2020-EMNLP, CCF-B) Mention Extraction and Linking for SQL Query Generation [paper]
(2020-EMNLP, CCF-B) IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation [paper] [code]

(2020-arXiv, None) Hybrid Ranking Network for Text-to-SQL [paper] [code]
(2019-arXiv, None) X-SQL: reinforce schema representation with context [paper]
(2019-EMNLP, CCF-B) Global Reasoning over Database Structures for Text-to-SQL Parsing [paper] [code]
(2019-EMNLP, CCF-B) Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions [paper] [code]

(2019-ACL, CCF-A) Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing [paper] [code]
(2019-ACL, CCF-A) Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation [paper] [code]
(2018-EMNLP, CCF-B) SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task [paper] [code]
(2018-NAACL, CCF-B) TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation [paper] [code]
(2017-arXiv, None) SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning [paper] [code]
🔥 基础模型
Llama [paper] [code] [model]
ChatGLM [paper] [code] [model]
Alpaca [paper] [code] [model]
Vicuna [paper] [code] [model]
WizardLM [paper] [code] [model]
Falcon [paper] [code] [model]
ChatGLM2[paper] [code] [model]
Baichuan-7b [code] [model]
Baichuan-13b [code] [model]
InternLM [paper] [code] [model]
Llama 2 [paper] [code] [model]
Code Llama [paper] [code] [model]
Qwen [paper] [code] [model]
Baichuan 2 [paper] [code] [model]
Phi-1.5 [paper] [model]
Mistral-7B [paper] [code] [model]
Deepseek [paper] [code] [model]
MiniCPM [paper] [code] [model]
Mixtral-8x22B [paper][code] [model]
Llama 3 [paper] [code] [model]
Qwen-1.5-110B [paper] [code] [model]
Qwen2 [paper] [code] [model]
Llama 3.1 [paper] [code] [model]
Qwen2.5 [paper] [code] [model]
Llama 3.2 [paper] [code] [model]
💡 微调
P-Tuning [paper] [code]
LoRA [paper] [code]
P-Tuning V2 [paper] [code]
RLHF [paper] [code]
RRHF [paper] [code]
QLoRA [paper] [code]
RLTF [paper] [code]
RRTF [paper]
RLAIF [paper]
💪 数据集
WikiSQL [paper] [code] [dataset]
Spider [paper] [code] [dataset]
SParC [paper] [code] [dataset]
CSpider [paper] [code] [dataset]
CoSQL [paper] [code] [dataset]
TableQA [paper] [dataset]
DuSQL [paper] [dataset]
KaggleDBQA [paper] [code] [dataset]
CHASE [paper] [code] [dataset]
BIRD-SQL [paper] [code] [dataset]
BIRD-SQL Mini-Dev [paper] [code] [dataset]
Spider 2.0 [paper] [code] [dataset]
🌈 评测指标
Execution Accuracy (EX) [paper]
Exact Match (EM) [paper]
📦 库函数
🔧 实践项目
DB-GPT-Hub

sqlcoder

modal_finetune_sql

LLaMA-Efficient-Tuning

🔗 引用
如果您发现
Text2SQL对您的研究或开发有用,请引用以下论文:🤝 友情链接
eosphoros

Awesome-AIGC-Tutorials
