Great talk with Demi about Natural Language processing, machine translation and the challenges GR solved with IBM Watson.
In the legal services industry, Global-Regulation is using NLP and machine translation to build the most comprehensive world law search engine. I recently spoke with CTO Sebastian Dusterwald to discuss how Global-Regulation uses Watson NLP technology to translate laws into English.
Sean: Sebastian, tell us about Global-Regulation and what your team does.
Sebastian: At Global-Regulation, it is our mission to democratize access to laws from across the globe. We handle large amounts of text data. We index, process, and translate nearly 2 million laws from nearly 100 countries, from Brazil to China to France to Italy and more, using machine translation. We help make laws searchable and accessible in English. We do all of this with a very small team, and none of it would be possible without the amazing AI-powered cloud services provided by the Watson platform.
Sean: Very cool. Do you have any recent examples to share about how the team is using Watson?
Sebastian: Recently one of our clients approached us about adding categories to our law metadata, in order to make it easier for them to find the laws that are relevant to their business use case, of monitoring specific types of laws (such as those in healthcare and cybersecurity) to maintain regulatory compliance. With so many laws in our database, discoverability is always an issue, so we thought this could be a great feature to add to our site. The problem is that very few of our sources provide any sort of categorization metadata, and those that do all use slightly different categories, so simply grabbing this data during indexing was out.
We needed a system that could analyze and process our text data, and then categorize it in preset bins. IBM suggested that we try out the IBM Watson Natural Language Understanding (NLU) API. This does exactly what we want out of the box: it allows us to upload training data and then to classify natural language text based on that.
Sean: Interesting, so what did you do next?
Sebastian: Well, we went through our database to find several laws that we thought were representative of each category, from finance to cybersecurity to environment-based laws. We then went through each of those laws and picked out chunks of text that we thought were relevant to the category. This was the most complicated and labor-intensive part of the implementation process. Care had to be taken to take chunks of text that were specific enough to train the NLP algorithm about the domain the law refers to, while being generic enough to not over-train the algorithm. This meant avoiding words such as specific names of countries or people, or dates. Including them would have risked training the algorithm on keywords that would look very specific but have nothing to do with the category on hand.
The training set was simply entered into a spreadsheet, and then uploaded to the IBM Watson NLU API. After a short wait for it to process the data, the API was now ready to accept queries. Our approach was to use the first 1024 characters of a law to classify it. This generated quite good results, in part because the first 1024 characters of a law typically include its title, which tends to include a number of keywords that the algorithm can use. At this stage we were now pretty sure that the IBM Watson NLP technology would be suitable for our use case, albeit with a little bit of fine tuning.
Sean: That’s great! Can you tell me a bit more about how you then fined tuned Watson to meet your client’s use case?
Sebastian: The first thing we did was to take samples across each of the laws in our database, such as healthcare, welfare, and privacy-based laws. Instead of taking just the first 1024 characters of each document, we took 5 samples of 1024 character chunks evenly spread across the document. We then averaged out the confidence scores returned by the IBM Watson NLU API and chose the highest value returned as the category for that law. This significantly increased the accuracy of the classifier for our dataset.
Next we looked at laws that we found to be classified incorrectly. We compiled a list of such laws and went through them, using more text fragments from each of these to add to the training set. Once this was completed, we uploaded it to the IBM Watson NLU API and waited for it to train a new and improved classifier. This further improved the accuracy and at this point we were happy with it. So as a final step we started to run the classifier across our entire database of laws.
Sean: Glad to hear it all worked out. Do you have any final thoughts or takeaways you would like to share about working with IBM Watson NLP technology?
Sebastian: Yes, absolutely! As you can tell, automatically translating and classifying nearly 2 million documents into a number of categories was a daunting task for a small team working with limited resources. With the volume of new laws coming in globally, our company needs to keep up with the demand and constant changes to existing laws at a global scale. We can say confidently that without the help of the Watson platform we would not be able to translate and categorize the millions of documents coming into our database in such a short time. We managed to have the basic implementation running in about a week, which is phenomenal! Thanks to the Watson platform, our small company can punch well above our weight.
Originally published in Watson Blog – https://www.ibm.com/blogs/watson/2020/05/nlp-in-the-real-world-how-global-regulation-organized-its-comparative-law-search-engine-with-watson/
I think that the basis for every law, and many time a sub-system of laws, is a regulatory mechanism – e.g., command and control for criminal law or green light regulation for environment law (you will get tax credit if you reduce your emission).
I think we’re both right.
Theoretically, the elected government is supposed to represent the public’s priorities and hence instruct the administration to come up with regulatory mechanisms to be translated into laws to effectively address these priorities. This is the democratic system.
Updating the company’s policies based on new legislation is a major part of the regulatory manager work. Until recently, this task has been especially challenging with no one-stop reliable source to go to in order to receive updates on new legislation.
While few companies provide updates on new legislation, it is limited to North America (See for example Lexis’ State Net and Pulse, Fiscal Note, and Govtmonitor in Canada. Others provide updates on financial regulation like 8of9 RegAlytics and Compliance.ai).
In order to face this challenge, Global-Regulation has utilised its system to start providing weekly updates on new laws from 46 countries, about half of which are machine translated to English.
Our ‘new laws’ section shows new laws from 46 countries with the option to filter by country.
In addition, we created an option to receive weekly email alerts on new laws based on the user’s keyword (please note: keyword email alerts for new laws can be created only by subscribers).
After creating the alerts, the user will receive an email every time her keywords appear in new laws.
These personally customised keyword based email alerts are available for unlimited users under the corporate subscription.
Lawyers, or more precisely their information managers, will always find a problem in your legal information technology. It is not updated, it is not accurate, it uses machine translation, its not your junior lawyer, It does not make coffee.
Is it because they are afraid for their jobs? imagining this IBM PR creation (ROSS) thing that will take over 90% of the firms employment opportunities and leave young associates (and old) unemployed?!
Or maybe it is because they would not settle for less than the perfect machine that will save their firm a fortune by doing all the work by itself with no need to generate a pay slip?!
I argue that they are not ready yet for legal technology. Sure, they use search engines like Lexis and Westlaw but when it comes to real legal technology – they are afraid. It disrupts their perception of the legal profession.
As always, the client is the one paying for the fear.
“UBS never took enough interest in its risks”, Financial Times, 20.12.2012
Let’s start with the bad news – we did not win the Americas UBS Future of Finance challenge 2017. The good news is that we had the opportunity to pitch our RegTech vision and not less important, to get an inside look at UBS’s technology use (or lack of) in this field.
Our pitch was simple: you (UBS) need a regulatory compliance system (much like the one we’re currently offering for world laws – but much more advanced; a Smart system that can track, translate, map, compare and digest new regulatory change in less than an hour – globally. A learning system that will co-evolve with the bank systems and thus prevent future fines and minimize risk.
The justification was strait forward: according to a BCG recent report, the number of individual regulatory changes that banks must track on a global scale has more than tripled since 2011, to an average of 200 revisions per day. This is not a scale humans can handle efficiently. Hence it is no surprise that Banks paid $42 billion in fines in 2016 alone and $321 billion since 2008.
Technically speaking the Americas finals in which we participated were organized to the last detail. Though dietary options were not available (vegan, gluten-free etc.), the bank allocated relevant representatives to meet with each finalist and provide feedback on the pitch. For us these meeting felt like development meetings as the bank people offered great ideas to enhance our vision.
More importantly, it was an indication from a first-hand internal source that the bank (and other banks as well) is light years behind when it comes to RegTech and regulatory compliance. Given the bank spending in this field (in the billions) it is quite amazing and certainly was reassuring going to the pitching competition.
Inconveniently, while the mentoring session was held at the bank’s offices in Manhattan, the finals were held at the offices in New Jersey. This divide forced the candidates to move from one hotel to another and/or struggle with the massive transportation challenges that New York City has to offer.
With no expected diversity, the judges were all IT people. The America’s CEO Tom Natatil gave the opening speech but failed to stay for the actual competition. The judges were provided with feedback from the previous day mentors (ours was excellent) but did not provide any feedback or reasons for their choice of the winning pitch nor the 2nd and 3rd runners-up.
The winner, Authomate, pitched a mobile security system to allow the bank clients to log into the bank’s portal safely. While the technology may be new, this is by no means an innovative concept nor disruptive. Moreover, based on corporate logic, this will probably be the last technology UBS will adopt.
It is too early to say if the bank will be interested in our vision for the future. The same way that it was not clear whether the finalists were supposed to pitch a future venture that can be developed with the bank, or what they already have (Automate) to be used by the bank. Either way one thing was clear, as most big corporations, UBS structure is very fragmented and the chance to capture the attention of the relevant person is extremely challenging.
To summarize the experience, I would like to use the same citation I used at the end of my pitch: “Increasing regulation is here to stay – much like a permanent rise in sea level. In an era of rising regulatory seas, focus on management is mandatory, not optional. Top performers will use the opportunity to incorporate technical innovation” (BCG Report).
Whether UBS is a top performer is yet to be seen.
Governance, Risk management and Compliance (GRC) platforms are the organization’s tool to help handle, among others, its regulatory affairs. This is what the RSA Archer® Suite is designed to provide through RSA® Archer® Regulatory & Corporate Compliance Management.
With most of the world laws (1.6 million laws from 90 countries translated from 30 languages) in English, in addition to complexity map and AI driven penalty identifier, Global-Regulation.com is positioned perfectly to complement the RSA Archer Suite.
This is the reason that Global-Regulation has a technology interoperability with RSA Archer Suite to offer customers an XML download of the world laws by Global-Regulation.com directly to RSA Archer Regulatory & Corporate Compliance Management, to empower customers to obtain better visibility into their compliance needs.
Now, with the launch of the RSA Archer Exchange available to RSA Archer customers, this technology interoperability can be even more seamless and easy than before.
After using MS machine translation (and some Google) to translate more than 750,000 laws and regulations from 26 languages, we are featured in a new MS Translator Case Study:
What if you could discuss your search query with the search engine? well, now you can. Our new feature suggest search ideas based on the user’s query. These search ideas are extracted from our world’s laws database itself.
Here’s how it works:
1. We take the text of every law in the world and extract the most frequently mentioned word pairs, on a per-law basis. This way we create a new database of word pairs.
2. When someone does a search we check the database of word pairs and take the word pairs that occur most frequently in association with the word or word pair that the user is searching for. So a search for “coffee” will return keyword suggestions for words that appear in laws that mention “coffee” most commonly.
3. We then filter the words and take the best matches and display those to the user. These are the search ideas.
You can click on the search ideas in yellow at the top and it will be updated according to your recent search. For example, lets say you started with Coffee –> then you choose ‘Coffee Agreement’
And then choose ‘system certificates’. This is endless.
This new feature actually enable you to interact with the search engine and follow the trail that is based on the database of word pairs we created from our gigantic database of the world’s laws.
I had a wonderful experience with my 4th year Law & Technology students last Thursday. I asked them to search Global-Regulation.com for privacy laws that relates to teenagers and then create a scenario that describes these rights in a way that teenagers can understand.
After creating the scenario, the students, working in groups, needed to choose the pictures for each square of the scenario and we uploaded it to a website I created for this purpose – Privacygames.com.
The results were amazing and the students were fascinated both by the legislation search in Global-
Regulation, and with creating the scenarios.
The best scenario was an illustration of legislation that is set to protect the privacy of teenagers by determining that a physician has a discretion to report a pregnancy of a girl under 16 to her parents if he feels that she is not capable of dealing with the situation.
Another scenario was describing new legislation in New Zealand that makes it an offence to engage in ‘revenge porn’.
Empowering teenager’s by informing them of their rights and obligations is an exciting field that should be fostered. Using Global-Regulation
for class exercise is really intriguing for the students.