At SpotDraft, we use machine learning to solve challenging problems on legal contracts. One such problem is the extraction of entities like date, duration and geographical locations from contracts.
The general purpose entities present in contracts are used to extract specific legal concepts like Effective Date, Termination Date, Jurisdiction, Notice Period, etc.. Extracting this information from contracts allows users of our platform to manage and search through their contracts with ease. The current way of doing this is opening each document and doing Ctrl+F for specific keywords.
Not only is this extremely time consuming, it’s not guaranteed to work as there is no common search phrase that will work across documents. The SpotDraft AI extracted dealpoints solve this problem by not requiring such searches and allowing users to search across their repository of contracts.
These entities also serve as a core part of the SpotDraft knowledge about a document allowing us to build more complex features like smart redline and notifications.
To solve this problem, we use a transformer based Named Entity Recognition (NER) model. This is done using a combination of two things:
This model (which we call SpotLegalBERT*) is trained on tens of thousands of internally tagged and verified data points created using real-world contracts from across the English speaking world.
As part of the evaluation and training, we performed experiments to determine our model’s efficacy by comparing it to models trained using Google’s AutoML Entity Extraction and other transformer architectures.
We were pleasantly surprised to see that our model performs better than AutoML. The following is a comparison of SpotDraft’s (top) model performance and AutoML’s (bottom) model performance -
It is evident from the table above that our model is able to extract entities with far greater accuracy across all the labels. This is a very exciting achievement since AutoML performs hyperparameter tuning and iterative Neural Architecture Search to find the best possible model for the task.
We believe that the major contributor to high accuracies of our model is domain adaptation of the transformer model on legal contracts. Domain adaptation allows our model to understand complex legal nuances and deal with specific legal vocabulary that is usually not present in pre-trained language models.
Sounds interesting?
Come work at SpotDraft and help automate the mundane! Check out spotdraft.com/careers