Extraction Interface
Last updated
Last updated
The Data Extraction page is where you review and validate the data extracted from a document. This page allows you to ensure the accuracy of the extracted entities and manage the document's verification status.
Document Viewer
Main Display: The central area shows the document you are working on. You can view the document as text or as an image, depending on your preference.
Page Navigation: On the left, there is a vertical list of the document's pages. Each page is marked with its verification status (e.g., Verified, Unverified, Rejected), allowing you to quickly see which pages need attention.
Extracted Entities Panel
Entities List: On the right side, there is a panel listing all the entities extracted from the document. Entities are categorized (e.g., TOTAL_LIABILITIES_CURRENT_YEAR, INCOME_CURRENT_YEAR) and can be expanded to see more details. In addition, you can see the output of other services in the workflow (e.g in the example below we can visualize the output of an LLM "Risk Facor Extraction" and the post-processing service)
Reviewing Extracted Data
Add Annotations: Within the document viewer you can label entities within the document to correct any errors or add missed entities.
Edit Annotations: You can click on any highlighted entity within the document to adjust its selection. This is useful for correcting any errors in the extracted entities or the extracted OCR text.
Validating and Managing Data
Validate Data: After reviewing the extracted data, click the "Validate" button to confirm its accuracy. This updates the status of the document or specific pages to "Verified".
Reject Data: If the data extraction is incorrect, you can mark the document or specific pages as "Rejected" for further review and correction.
Delete Annotations: To remove all annotations from the document, use the "Delete all annotations" option in the Extracted Entities panel.
Download Extracted Data: Use the download icon to export the extracted data for further use.
Unclassify Document: If a document has been classified incorrectly, use the unclassify icon to reset its status and reprocess it.
Navigation and Display Options
Switch between Named Entities and Relations Views: Use the drop-down menu at the top of the page to switch between viewing named entities and relations within the document.
Named entities are specific data points, while relations are connections between those data points.
Switch Views: Toggle between viewing the document as text or as an image to better analyze the data extraction.
Navigate Pages: Use the page navigation controls on the left to move through the document. This helps in reviewing and validating each page individually.
Zoom and Pan: Adjust the view of the document to closely inspect specific sections and ensure accurate data extraction.
Rapid Learning
Kudra enables you to provide reference examples for the AI model to learn from when extracting the data. When Rapid Learning option is enabled, every validated document will be sent to the model as a reference to follow when extracting the data from the next uploaded document (s). Note, the rapid learning option is only available with GPT Entity Extractor service.
Reviewing and Validating Extracted Data
Select a Document: Choose the document you want to review from your project list.
Inspect Extracted Data: Look at the highlighted areas within the document to check if the extracted entities are correct.
Edit Annotations: Click on any entity to adjust its boundaries if necessary.
Validate or Reject: After reviewing, click the "Validate" button to confirm the data or mark it as "Rejected" if there are errors.
Page Navigation: Use the controls on the left to move through different pages and repeat the validation process for each one.
Managing Extracted Entities
View Entities: In the right panel, examine the list of extracted entities.
Edit or Delete: Click on an entity to edit its details or use the "Delete all annotations" option to remove all extracted data if you need to start over.