Structured Data Extraction Using Large Language Models

Unmet Need: Method in Long-Form data extraction for LLMs

Large Language Models (LLMs) excel in Natural Language Process (NLPs) tasks but face challenges in extracting information from large databases effectively. Current practices fail to address the inherent limitations of LLMs, resulting in suboptimal performance for long-form data extraction tasks.

Researchers at Washington State University (WSU) have developed a nuanced, multi-step method to overcome these limitations. They have used a combination of vision-capable LLMs and a sliding window technique for data extraction. The application and optimization for handling long-form data extraction in LLMs represents an innovative and practical solution to a growing challenge in the field, which makes a valuable contribution to the evolution of NLP and LLM technologies.

The Technology: Innovative Multi-step Method for Enhanced Long-Form Data Extraction in LLMs

WSU Researchers introduced the sliding window method to overcome the limitations of LLMs in long-form data extraction. This method overcomes issues such as incomplete extractions due to complicated input data structures, better instruction following for complex extraction requirements, and small output context limitations.  This approach allows for handling much larger datasets than previously possible with single-pass extraction methods, balancing cost, speed, and quality.

Applications:

  • Overcome limitations such as the limited output window of LLMs and logical errors in generating extensive text
  • Facilitates efficient large-scale data mining
  • Extracts valuable insights from extensive medical records and research papers
  • Streamlines the analysis of lengthy legal documents
  • Accelerates the extraction of key findings from vast scientific research data

Advantages:

  • Enhanced scalability and accuracy
  • Maximize efficiency
  • Cost-efficiency
  • Flexibility and optimized performance

Patent Information:

A provisional patent application has been filed.

Learn More

Punam Dalai
Technology Licensing Associate
Washington State University
(509) 335-1216
punam.dalai@wsu.edu
Reference No: Software-25/3609

Inventors

Xiaofeng Guo
Haydn Anderson
Juejing Liu
Noah Waxman

Key Words

Data mining
Natural Language Processing