From the course: Fundamentals of AI Engineering: Principles and Practical Applications

Unlock this course with a free trial

Join today to access over 24,300 courses taught by industry experts.

Document parsing and structure recognition

Document parsing and structure recognition

- [Instructor] In our previous video, we covered the basics of text extraction, namely getting raw text from different document formats. Now we're going to take things a step further and explore document parsing and structure recognition. While basic text extraction gives us the content, it often loses the structure that helps humans understand documents. Let's dive into how we can use LlamaIndex to recognize and maintain document structure. Let's open up chapter_three and open up notebook 03_03.ipynb. As always, in the upper right hand, make sure that the Ven that you've selected is the .ven1 that's been pre-created in Code Spaces. Before I say anything, let's take one step back and conceptually think about how we as humans read data. When we read documents, we don't just process the words. We rely heavily on structure to understand the content. For example, headers signal new topics. Lists organized related items, and tables present structured data. For AI systems preserving the…

Contents