Automate Document Processing in Logistics using AI

Multi-modal transportation is one of the biggest developments in the logistics industry. There has been a successful collaboration across different transportation partners in supply chain freight forwarding for many decades. But there’s still a considerable overhead of paperwork processing for each leg of the trip.

Source: Automate Document Processing in Logistics using AI

Tens of billions of documents are processed in ocean freight forwarding alone. Using manual labor to process these documents (purchase orders, invoices, bills of lading, delivery receipts, and more) is both expensive and error-prone.

In this blog post, we’ll address how to automate the document processing in the logistics industry. We’ll also show you how to integrate it with a centralized workflow management.

Automated document processing architecture

Figure 1. Architecture of document processing workflow

Figure 1. Architecture of document processing workflow

The solution workflow shown in Figure 1 is as follows:

  1. Documents that belong to the same transaction are collected in an S3 bucket
  2. The document processing workflow is initiated
  3. The workflow orchestration is as follows:
    • Document is processed via automation
    • Relevant entities are extracted
    • Extracted data is reviewed
    • Order data is consolidated

This architecture uses Amazon Simple Storage Service (S3) for document storage, and Amazon Simple Queue Service (SQS) for workflow initiation. Amazon Textract is used for text extraction, Amazon Comprehend for entity extraction, and Amazon Augmented AI (A2I) for human review. This will ensure correct results in cases of low confidence predictions.

We use AWS Step Functions for the orchestration of document processing workflow. Step functions also help to improve the application resiliency with less code.

AWS Lambda functions are used to:

  • Detect if all required documents for a given transaction are available in Amazon S3
  • Kick off the process by creating an Amazon SQS message
  • Detect a new processing job from a generated SQS message
  • Extract text from PDFs using a Step Function
  • Extract entities from generated text using a Step Function
  • Control data completeness and accuracy
  • Initiate a human loop when needed using a Step Function
  • Consolidate the data collected from documents
  • Store the data into the database

Document ingestion and classification

There are several data ingestion options available such as AWS Transfer FamilyAWS DataSync, and Amazon Kinesis Data Firehose. Choose the appropriate ingestion blueprints based on the type of data sources. Typical real-time ingestion blueprints include AWS Lambda processing and an Amazon CloudWatch event. The batch pipeline can leverage AWS Step Functions. This can be used to orchestrate the Lambda function that initiates the document processing workflow.

Here are some things to consider when building your document ingestion and storage solution:

  • Choose your bucket strategy. Amazon S3 is an object store. Analyze your data pipeline ingestion carefully and choose the correct S3 bucket strategy for each document type (bills, supplier invoices, and others.)
  • Organize your data. The data is organized in S3 buckets by layers: RawStaging, and Processed. Each has their own respective bucket policy and access control.
  • Build a creation tool. This is an automated data lake bucket/folder structure tool, based on your data ingestion requirements. You can use this same structure for user-created data.
  • Define data security requirements. Do this before you begin the ingestion process. Before ingesting new or current data sources into AWS, secure access to the data.
  • Review security credentials needed for access. After copying these credentials into AWS Systems Manager (SSM), apply an AWS Key Management Service (KMS) key to encrypt the file. This encrypted key string is stored in SSM to use for authentication.

Document processing workflow


The workflow checks the input buckets until it detects all the documents types necessary for a complete dataset. In our case, it is the invoice document and customs authorization form. Once both are detected, it generates a job request as a message in Amazon SQS. A Lambda function then processes the message and kicks off the Step Function flow (see Figure 2). The state machine then initiates the document processing, text extraction, and optional human review steps. AWS Step Functions are well suited for our use case due to its ability to manage long-running workflows.Figure 2. Visual workflow of document processing in AWS Step Functions

Figure 2. Visual workflow of document processing in AWS Step Functions

Entity extraction

For each document, entities are extracted using Amazon Textract and Amazon Comprehend. These entities can include date, company, address, bill of materials, total cost, and invoice number.

Following is a sample invoice document that is fed to Amazon Textract, which extracts the form data and creates key-value pairs.Figure 3. Highlighted different entities in the sample invoice document

Figure 3. Highlighted different entities in the sample invoice document

See Figure 4 for an example of the key-value pairs extracted for the sample invoice. The keys here represent the form labels (“SHIP TO”) and the values represent form values (shipping address).Figure 4. Key-value pairs of the invoice data, extracted by Amazon Textract

Figure 4. Key-value pairs of the invoice data, extracted by Amazon Textract

Amazon Textract also generates a raw text output that contains the entire text, as shown in Figure 5 following.Figure 5. Raw text output of the invoice data extracted by Amazon Textract

Figure 5. Raw text output of the invoice data extracted by Amazon Textract

To achieve a higher degree of confidence, Amazon Comprehend is used to identify and extract the custom entities. Amazon Comprehend is a natural language processing (NLP) service that uses machine learning (ML) to identify and extracts insights and entities from text data. You can train Amazon Comprehend to identify entities relevant to your organization. These can be product names, part numbers, department names, or other entities. You can also train Amazon Comprehend to categorize documents or assign relevant labels to text.

An Amazon Comprehend entity recognizer comes with a set of pre-built entity types. Amazon Comprehend can introduce custom entities to match our specific business needs. Some of the entities we want to identify are address and company name. We trained a custom recognizer to detect company names and addresses, see Figure 6.Figure 6. Training details of custom entity recognizer

Figure 6. Training details of custom entity recognizer

Figure 7 shows the resulting output from Amazon Comprehend:Figure 7. Amazon Comprehend entity recognition output

Figure 7. Amazon Comprehend entity recognition output

The document is processed top-down, from left to right, from the sample invoice in Figure 3. We know that the first company and first address belongs to the Billing Company. And the second set belongs to the Shipment recipient. Along with detecting custom entities, Amazon Comprehend also outputs the confidence score of the extracted result.

Confidence scores can vary depending on how close training data is to actual data. In the example preceding, the first company entity came back with a score of 0.941. Let’s assume that we have set a minimum confidence score of 0.95. Anything below that threshold should be reviewed by a human. The following section describes the last step of our workflow.

Human review

Amazon Augmented AI (A2I) allows you to create and manage human loops. A human loop is a manual review task that gets assigned to a workforce. The workforce can be public, such as Mechanical Turk, or private, such as internal team or a paid contractor. In our example, we created a private workforce to review the entities we were not confident about. Figure 8 shows an example of the user interface that the reviewers use to assign entities to the proper text sections.Figure 8. Manual review interface of Amazon A2I

Figure 8. Manual review interface of Amazon A2I

Review tasks can be automatically submitted to the workforce based on dynamic criteria, after both AI-related steps are completed. It can be used to review the text detected by Amazon Textract when key data elements are missing (such as order amount or quantity). It can also review entities after invoking Amazon Comprehend.Figure 9. Consolidated dataset of processed invoice and customs authorization data

Figure 9. Consolidated dataset of processed invoice and customs authorization data

After the manual review step, data can be consolidated (as shown in Figure 9) and stored into a relational database. It can also be shared with other business units such as Accounting or Customer Services. You can apply the same process to other document types such as custom forms, which are linked to the same transaction. This allows us to process and combine information that comes from disparate paper sources more efficiently.


This post demonstrates how document processing can be automated to process business documentation by using Amazon Textract, Amazon Comprehend and Amazon Augmented AI.

Deploying an automated solution in the logistics industry takes away the undifferentiated heavy lifting involved in manual document processing. This helps to cut down the delivery delays and track any missed deliveries. By providing a comprehensive view of the shipment, it increases the efficiency of back-office processing. It can also further simplify the data collection for audit purposes.

To learn more: