There are situations when a company has a bunch of invoices with a similar format which they need to post into some external system like SAP. Usually, it takes a lot of time to open each invoice and extract needed information from it and enter it into SAP. This sample shows how to extract both searchable and image-only PDFs using OCR and XPath.
The difference between searchable and image-only PDFs is that text in searchable PDF documents can be selected, copied, and marked up. So, instead of using OCR, you can open it with the Mozilla Firefox browser and copy required information using XPaths.
An employee needs to open each PDF manually, copy required information, then log in to SAP, find the MIRO transaction and post the previously copied data into the system, which usually takes about 15 minutes for one invoice.
Instead of extracting all the required information in invoices manually, the bot scrapes it using optical character recognition in case of image-only PDFs, and XPaths – in searchable PDFs. Then, the bot posts this information to your SAP system.
Example Overview Video
See the video showing how our bot works.
Installation and Getting Started
Extract the sample folder and drop it into your default Workspace in WorkFusion Studio (C:\Users\username\workfusion-workspace\rpae_project).
Open the sample in WorkFusion Studio.
Go to Window > Preferences, expand WorkFusion Studio, and select the Secrets Vault tab.
Press the Add button and enter your secret entry values:
Alias – SAP_creds
Key – your SAP username
Value – your SAP password
Press Apply and Close.
Copy input files from the Invoices folder to any appropriate folder and edit the following variables:
folder_path– thepath to the input files with invoices in PDF format
sap_path – the path to saplogon.exe
tax_code – code of the tax in your SAP GUI, which is set here
Set the default zoom level for Adobe Acrobat Reader to 100%