stereoplegic 's Collections Layout
updated
UI Layout Generation with LLMs Guided by UI Grammar
Paper
• 2310.15455
• Published • 3
You Only Look at Screens: Multimodal Chain-of-Action Agents
Paper
• 2309.11436
• Published • 1
Never-ending Learning of User Interfaces
Paper
• 2308.08726
• Published • 2
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
• 2309.10952
• Published • 67
LASER: LLM Agent with State-Space Exploration for Web Navigation
Paper
• 2309.08172
• Published • 14
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language
Models
Paper
• 2309.09506
• Published • 15
DSG: An End-to-End Document Structure Generator
Paper
• 2310.09118
• Published • 2
On Web-based Visual Corpus Construction for Visual Document
Understanding
Paper
• 2211.03256
• Published • 1
Attention Where It Matters: Rethinking Visual Document Understanding
with Selective Region Concentration
Paper
• 2309.01131
• Published • 1
DocFormerv2: Local Features for Document Understanding
Paper
• 2306.01733
• Published • 1
OCR-free Document Understanding Transformer
Paper
• 2111.15664
• Published • 6
DocParser: End-to-end OCR-free Information Extraction from Visually Rich
Documents
Paper
• 2304.12484
• Published • 1
Understanding HTML with Large Language Models
Paper
• 2210.03945
• Published • 1
Leveraging Large Language Models for Scalable Vector Graphics-Driven
Image Understanding
Paper
• 2306.06094
• Published • 1
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
• 2401.00908
• Published • 189
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper
• 2401.02823
• Published • 36
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
Multi-modal Large Language Models
Paper
• 2311.07575
• Published • 15
LayoutPrompter: Awaken the Design Ability of Large Language Models
Paper
• 2311.06495
• Published • 12
Viewer
• Updated • 2.75M • 5.4k
• 386
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in
Large Multimodal Models
Paper
• 2401.13311
• Published • 12
Empowering LLM to use Smartphone for Intelligent Task Automation
Paper
• 2308.15272
• Published • 1
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler
Generation
Paper
• 2404.12753
• Published • 43