Software That Reads Laws: PenaltyAI Search – Global Risk & Compliance Redefined

About a year ago I sat with my CTO in the Manhattan office of one of the world’s largest accounting firms. Their regulatory compliance global team was very impressed from what we’ve done so far with Global-Regulation and wanted to know what more we could do. As usual with large firms, they wanted a system that does everything – from tracking new bills to predicting the future (step 10 instead of step 3).
The ambition to create the ultimate risk and compliance system stuck with us. This ambition came into life when we realized, in one of our internal discussions about our global law search engine that penalties are the kind of information that can be identified with a high degree of certainty by an Artificially Intelligent system.

My story begins in the 2000s when I helped the Israeli court system work with IBM to digitize legal information. I’ve seen the slow evolution of legaltech and listened to the ambitious ideas of tech people. But I’ve also seen the reality of legal technology and wondered: how can we give machines the insight of lawyers?

Fast forward to 2017, after seemingly endless testing, experimenting, coding, consulting (thank you to Kyle Gorman from Google for the words to numbers converter recommendation) and hard work – we are extremely excited to present the PenaltyAI Search – the first and only AI system that identify compliance clauses in legislation on a global scale, extracts the actual penalties amount and serve it all to the user in US dollars.

Now risk and compliance professionals can search and identify risk levels across jurisdictions on a specific topic without even reading the law. Lets say that you are an IBM executive considering global expansion of your Watson services to new markets – with a click of a mouse you can now use the PenaltyAI Search feature of Global-Regulation to learn what would be the risk level of your goal.

Screenshot of PenaltyA Search for "tobacco nicotine"

Combine this with our complexity feature, suggested search ideas and related laws – and a risk & compliance team can feed Governance, Risk and Compliance (GRC) platforms with all the information needed to launch a new business line, in a matter of hours. Before, this would have taken months, require an army of translators and a division of analytics to determine risk and compliance.

We see this as a great achievement on several levels:

  1. an AI system that can really read legal text and produce useful meaning; and,
  2. enabling risk and compliance professionals to explore real and relevant data on a global scale, in English; and,
  3. allowing governments and businesses to assess and enhance their compliance efforts; and finally,
  4. for researchers to compare and contrast risk and compliance data globally.

Thank you big accounting firm for teaching us that even seemingly unsuccessful business meetings can bring great results. Thank you Microsoft Canada for your help in connecting us with the Microsoft Translator team. Thank you LegalX (now LawMade). Thank you Ken Thompson for UNIX and regular expressions. Thank you to my wife and children for your daily inspiration.

If you’d like to know more about how the system works technically, my CTO has written a blog post on building PenaltyAI Search.

Computers can now tell us about penalties for world laws.

SHARE THIS POST ON SOCIAL MEDIA

Big Data With Purpose: How We Calculated the Fines of 1.55 Million Laws

This is a technical explanation of how we built our “PenaltyAI Search” service that combs 1.55 million world laws from 79 countries for fines. It can answer questions like “What would I pay for violating money laundering laws in Jamaica?” or “How much would a smuggler who warehouses stolen goods in China pay if they’re caught?“.

The penalties are extracted by an offline algorithm that runs on an Azure VM that does the following steps:

  1. Find laws that mention keywords associated with civil penalties (as a first pass)
  2. Convert all word numbers (like “one million”) into international number format (“1,000,000.00”)
  3. Identify the paragraphs that likely contain civil penalties based on words and numbers
  4. Merge several penalties into one, whether they related to the same “clause” (section) of a law
  5. Extract all the clauses and penalties
  6. Exclude certain classes of text that are almost never penalties but look like penalties (such as laws about gold coins and section references in laws that have to do with money)
  7. Recognize currencies in text, and combine this data with our table of national currencies, and convert penalties into USD using Yahoo! Finance rates (through the XML API call)
  8. Store the penalties and clauses in a MySQL database (RDS)

Screenshot of one of the MySQL tables for penalties

We then note in our search instance whether or not a law has penalties attached to it, so that the search instance can filter by laws that have penalties (as opposed to our regular search that includes laws that don’t have explicit fines attached to them). This process is run as a batch job offline because our 1.55 million+ laws takes several hours to process and no one would wait that long for their search results!

When a user does a search, the search is first sent to our Elasticsearch instance, and then the penalties are looked up from the MySQL database afterwards. This allows full-text search of laws to be combined with penalties, and in a way that results in much less strain on our relational database (because penalties are looked up by IDs rather than a JOIN). Storing the penalties separately allows us to reduce the amount of data in the in-memory search instance, and decouples our services (since we have other types of search like technical standards and law analytics).

The laws themselves are indexed, downloaded, converted to text, parsed, and converted to English, using our pipeline that runs on another Azure VM with RDS as the data store. We make extensive use of the Microsoft Translator API to convert foreign legislation to English (since most of the world’s laws are published in languages other than English). Our use of the service is actually listed on the “Customers” page for Microsoft Translator. We’ve written elsewhere on our blog about some of the ways we gather and process world legislation.

SHARE THIS POST ON SOCIAL MEDIA

Graphing the World’s Laws: Visualization of 1.55 Million Laws + Our PenaltyAI Search

The graph above is the first time that penalties for non-compliance with the world’s laws has been visualized. It was made possible by the culmination of Global-Regulation Inc.’s R&D efforts over the last year to create an automated AI method for reading penalty provisions from civil laws – see the system here.

Our system (that we’re calling “PenaltyAI Search”) is now able to extract penalties from legislation (statutes and regulations) and present them in US dollars, along with the original text. This is a multi-phase process that starts with an AI based algorithm that identifies the penalty clauses. The next step is to extract the penalty amount from the penalty clause. This step includes complex linguistics mechanism that can convert amount in words into numbers like “one hundred thousand” to 100,000, and Indian English notation like “lakh” and “crore”. The next step is to convert different notation systems into a standardized decimal format (e.g. “560,99” to 560.99).The final step is converting all the world’s currency’s into USD to enable comparison on a global scale (which is done on an ongoing basis to account for currency fluctuations).

As for the graph at the top of this page, it was created by applying PenaltyAI Search to all of the laws in the Global-Regulation.com database (currently around 1.55 million laws from 79 countries) and then excluding countries with only a small number of laws available or too few penalties to make any useful statistical inferences. We’re making available the Excel file for the graph here: World Penalties – Feb 9 2017. We’ve excluded any penalties other than those within the top twenty most frequent for each country in order to eliminate outliers.If you make any use of this data please link back to this blog post and let us know by pinging us on Twitter @globeregulation.

The PenaltyAI Search system has been implemented into the Global-Regulation.com search engine and soon (within the next week) the user will be able to search, explore and drill down for a given topic, across jurisdictions or filtered by country. As usual, these features will be accompanied by our innovative visualization display.

We see this system as a ground breaking event in the field of extracting valuable information from legal text using algorthmic methods. On the theoretical level this is proof that the text of legislation can be mined for insights, and on the practical level, this is a celebratory milestone for compliance and GRC professionals that will be able to use our system to simplify their work.

Congratulations to our technical team that enabled us to go to where no legal tech product has gone before.

More updates will be available in the next edition of our newsletter and will be rolled out to subscribers shortly thereafter.

SHARE THIS POST ON SOCIAL MEDIA

Microsoft Translator Case Study

After using MS machine translation (and some Google) to translate more than 750,000 laws and regulations from 26 languages, we are featured in a new MS Translator Case Study:
https://www.microsoft.com/en-us/translator/customers.aspx#textsearch=global-regulation.

SHARE THIS POST ON SOCIAL MEDIA