In Microsoft 365 it’s really easy for users to create new Sites and Teams. By default, any user can easily spin up as many new working areas as they want, give them any name they chose and invite any colleagues that they wish to join them as members. This results in a bit of a free for all – with hundreds or thousands of workspaces being created, without any easy way of determining the purpose, value, or even the owning department.
The first step is to prevent users from creating new workspaces by turning off self-service group creation. However, this cannot be done in isolation – as turning off the tap will quickly cause pressure to build as staff find alternative collaboration solutions. As such, when you disable self-service group creation, you will want to simultaneously introduce a new provisioning process.
The aims of this new provisioning solution will vary between organisations, but typically most want to ensure that their workspaces are being configured consistently. There are lots of different approaches you can take here, but I would suggest that you start by identifying your objectives. Typically, the primary objectives of provisioning include:
- Supporting multiple ways of working – your provisioning process can include a series of different ‘templates’, each of which is optimised to support different types of activity across the organisation. As such, your provisioning process can utilise templates that provide a consistent starting point for the structure of your projects, committees and departments (and other types of work that are common across your organisation).
- Content classification – a provisioning process allows you to identify and apply appropriate default metadata to your libraries and folders. Each file will then be tagged ‘by-stealth’, simply based on where it has been saved. The idea here is to ensure that content is automatically tagged when it is first created, reducing effort for staff, while significantly improving the ability to find and manage information.
- Context – controlling the provisioning process offers you a unique opportunity to compile information about the context of the Site/Team, making it far easier for you to appraise its value in the future. If you choose to, you can even present some of this context back to your users – perhaps through naming conventions, descriptions or even replacing the workspace’s default image – which can help your staff become more confident about the purpose, ownership and security of their workspaces.
- Information Protection – you can easily weave sensitivity labels and even data loss prevention policies into your templates, so that content that requires a higher level of control can be protected automatically.
- Retention – one of the core aims of controlling provisioning is to ensure that all content is automatically included within the scope of your records management strategy. By integrating retention labels and policies into your provisioning process you can make certain that all of your records are governed across new workspaces.
Very often you will encounter resistance if you try to introduce controls around the creation of Sites and Teams. I’ve often heard people argue that the introduction of a provisioning process will impose barriers that delay or even impede users. This is very much not the objective!
Instead, our aim should be to ensure that the process of creating a new Site or Team is as simple as possible and that it introduces benefits for the whole organisation. Sure, you’ll need to introduce a new form that captures information about the nature of the working area that is being requested, and, naturally, filling out this form will slightly slow down the creation process. However, instead of focusing on the negatives, make sure to extol the benefits that you can introduce: not just the improved governance, information protection and reduced duplication, but also how much easier it will be for staff to search for and find well-classified content. I’d strongly believe that applying governance through a provisioning process will lead to significantly improved organisational efficiency in the medium/long term.
3. Undertake a high-level audit of your legacy content
Don’t worry, I haven’t forgotten! Fixing the issue with the dripping tap doesn’t fix the issue with the digital heap – but it certainly helps!
Once you’ve fixed the tap, the digital heap stops growing. From this point forward every chunk you can take out of the heap will be reducing it.
The first thing I’d recommend after turning off the tap is to undertake a high-level audit of your digital heap. Try to identify the volume of the data, the depth of your folders, and map this to the nature of the content and the part of the organisation who ‘own’ each area. There are automated tools (such as SharePoint Migration Assessment Tool or for file shares DROID or TreeSize), which can help with this.
Some organisations decide to assemble a team who are tasked with working through the heap to assess or even migrate content into a different structure. This is certainly a feasible, if costly, approach, which certainly can prove effective if there is a pressing need or deadline involved. Personally, while this process can certainly make significant inroads, or even flatten your heap altogether, it is frequently too time-consuming for many organisations to countenance.
4. Use various tools
Another approach is to look towards technology as a potential solution that can help you chip away at your digital heap. For content in Microsoft 365 we can make use of various tools to try to apply context to your content at scale, including:
- Trainable Classifiers – identify common types of content across your tenant and automatically apply retention and/or sensitivity labels to them. Trainable Classifiers allow you to take advantage of AI to automatically find consistent types of file. You train the classifier with at least 50 examples of the type of content and the AI will do the rest, by automatically scanning areas of your tenant and tagging files that are identified.
- Sensitive Information Types – another method of scanning your Microsoft 365 content at scale is to use Sensitive Information Types. These allow you to find content containing specific codes or reference numbers. They are especially useful when looking for content that contains personal information such as a driver’s licences or passport numbers. Once content containing the code/number has been found, you can automatically apply either retention or sensitivity labels to them, helping to improve the governance of content across your tenant.
- Azure Information Protection unified labelling scanner – if areas of your digital heap are stored across file shares or on-premise SharePoint farms then AIP scanner might be a useful tool to consider. The scanner allows you to automatically apply sensitivity labels by identifying content containing specified sensitive information types or regex patterns – perfect for extending the governance found in Microsoft 365 across your legacy data.
- SharePoint Syntex – a great tool for scanning files and automatically extracting metadata through AI. Best used for more consistently structured content (such as invoices and purchase orders), SharePoint Syntex allows you to build models that scan and apply labels to content that it identifies. If you want to find out more about SharePoint Syntex, check out my colleague Leon’s blog.
- Viva Topics – another workload that takes advantage of Microsoft’s AI capabilities, Viva Topics scans your content and identifies relationships in your existing data. The product automatically builds a knowledge network, using AI to identify key ‘topics’ – essentially it’s a bit like having an internal Wikipedia, built out of your existing content. While Topics certainly doesn’t replace a good information architecture, it presents an interesting option to automatically derive additional value from your legacy files.
There are plenty of other technical solutions you can lean on to help you tackle your digital heap, with a wealth of 3rd party products available that scan, assess and classify your content. However, I should point out that you might need to combine several approaches, as each in isolation will only help resolve some of your legacy governance issues.
Finally, while ‘doing nothing’ to tackle your digital heap clearly isn’t a solution, I should point out that once you’ve fixed the dripping tap, your heap becomes easier to manage with each passing year. Frankly, as the information in the heap drifts from active to legacy, the process of making bulk decisions becomes much simpler. Now to be clear, I’m not suggesting that you can reach for the delete key and dispose of the entire heap – but it will become easier to identify areas of the heap that don’t have high value and perhaps even haven’t been accessed in several years – and use this information when making your decisions.
If you want to have a chat about your own challenges with the digital heap, feel free to throw questions my way – I’m always happy to try to steer you in the right direction.