Introducing SharePoint Syntex: Form Processing and Document Understanding
Following its formal introduction at Microsoft’s Ignite 2020 conference, we are finally able to talk publicly about SharePoint Syntex – the first product to be released from Project Cortex. Our team at Intelogy has been lucky enough to have had early access to SharePoint Syntex giving us a chance to explore some of the powerful features it is bringing to Microsoft 365.
For those of you who have been following Project Cortex since its initial announcement at Ignite 2019, SharePoint Syntex is just the first of the products that will be launched in this area – another product (name TBD) focusing on knowledge curation and discovery is following hot on the heels of SharePoint Syntex and should be launched soon.
What is SharePoint Syntex?
Available from the beginning of October, SharePoint Syntex will provide you with the ability to automate the extraction of metadata from your files. Having this capability hosted directly within Microsoft 365 will allow SharePoint Syntex’s Content Center to become the heart of your organisation’s content processing.
To put it simply, SharePoint Syntex will make it easy for you to build processes that automatically capture and process content. Let’s imagine you receive a number of invoices and purchases orders, SharePoint Syntex makes it easy for you to configure AI-powered processes that extract financial information and even automatically apply sensitivity and retention labels to your financial files. Of course, Syntex isn’t just for your invoices – you can apply it to the types of content that are central to your organisation.
From our work during the private preview, we feel that SharePoint Syntex will significantly reduce the manual processing of content – resulting in tangible efficiency savings.
There are two separate major components that are brought within Syntex; Forms Processing and Document Understanding. These two functions are both used to extract metadata from content – but have been designed to work in different situations – so let’s dive in and take a closer look at the capabilities of both.
Form Processing with SharePoint Syntex
Perfect for extracting metadata from consistently structured files such as invoices and surveys, Forms Processing is great in situations where you need to capture information from large volumes of structured pdf files and images.
Built upon the AI Builder component of the Microsoft Power Platform, SharePoint Syntex’s Forms Processing capability allows us to set up the precise location of fields on a given form. Once a model has been built, AI Builder will then automatically read and extract metadata from the defined field locations from future files that are uploaded.
This extracted information is stored within SharePoint columns – meaning that many of the benefits of classifying content can be immediately applied to your forms. Not only will you be able to sort and group by the values that have been automatically extracted, you will also be able to use search to retrieve content by captured metadata.
In any scenario where you wish to extract metadata from a relatively large number of known, consistent locations on files, SharePoint Syntex’s Forms Processing tool looks to be an excellent solution to help automate your processes.
SharePoint Syntex’s Forms Processing capability is only going to improve even more over time – I personally can’t wait to see the ability to handle tabular data, such as the individual line items on an invoice or purchase order.
Document Understanding with SharePoint Syntex
For me SharePoint Syntex’s Document Understanding functionality has huge potential – it offers us a glimpse of the way all content might be processed in the not too distant future. Document Understanding offers AI-driven capability, through machine teaching, to automatically classify files and extract metadata from unstructured content.
To get the most out of Document Understanding you really need to have consistent types of unstructured files – you can’t point this tool at unrelated unstructured files, you need to have, for example, a collection of proposals or contract letters. You’ll need to create separate ‘Document Understanding Models’ for each different type of file that you wish to process. Realistically you probably need at least a few hundred files of the same type to really get to see the value of this functionality.
Document Understanding revolves around two key concepts – ‘classifiers’, which are trained to identify all of the files of a given type that are uploaded into a library, and ‘extractors’, which capture metadata that matches defined phrases or patterns:
A ‘phrase’ provides a combination of keywords or characters, which the model uses to help to locate relevant metadata.
A ‘pattern’ is used to identify a specific format of characters and numbers – such as a date or credit card number. A selection of pre-formatted patterns is provided to make the process straightforward.
Once you’ve created phrases and patterns, Document Understanding allows you to define a ‘proximity’ – i.e. how far apart you expect these parameters to be. For example, proximity allows you to look for a membership number, that has a ‘phrase’ of “Member number:” closely preceding an expected number ‘pattern’:
One of the best new features provided by SharePoint Syntex is the ability to automatically apply both Sensitivity and Retention labels to content. This is a great new features, which extends the number of ways we have a of ensuring that information protection and compliance can be baked into systems – all while minimising effort for users.
Which approach to use?
Working out which of these approaches is going to be best for you will be determined by the data you are looking to extract. If the files you are processing are images/pdfs and have a rigid and consistent structure, then Forms Processing is clearly the way to go. However, if you are looking to capture metadata from largely unstructured Office files and pdfs, then Document Understanding will be your tool of choice.
Looking to the future
This week’s announcement of the release of SharePoint Syntex is clearly the start of an exciting journey. Over the next few years I’m looking forward to the promise of automated AI processing, which can not only read your content, but actively understands the meaning of the content to an extent that all of our files can be subject to automated classification.
I’m excited to see where this leads – but perhaps a future of automated classification and compliance across all content isn’t that far from reality.
Having defined extensive Microsoft 365 EDRM systems and bespoke enterprise intranets, I specialise in overseeing cutting edge solutions that are tailored to meet customer needs. Providing leading expertise within the Information Management field, I enjoy helping organisations on their journey towards compliance.