Software That Reads Laws: PenaltyAI Search – Global Risk & Compliance Redefined

About a year ago I sat with my CTO in the Manhattan office of one of the world’s largest accounting firms. Their regulatory compliance global team was very impressed from what we’ve done so far with Global-Regulation and wanted to know what more we could do. As usual with large firms, they wanted a system that does everything – from tracking new bills to predicting the future (step 10 instead of step 3).
The ambition to create the ultimate risk and compliance system stuck with us. This ambition came into life when we realized, in one of our internal discussions about our global law search engine that penalties are the kind of information that can be identified with a high degree of certainty by an Artificially Intelligent system.

My story begins in the 2000s when I helped the Israeli court system work with IBM to digitize legal information. I’ve seen the slow evolution of legaltech and listened to the ambitious ideas of tech people. But I’ve also seen the reality of legal technology and wondered: how can we give machines the insight of lawyers?

Fast forward to 2017, after seemingly endless testing, experimenting, coding, consulting (thank you to Kyle Gorman from Google for the words to numbers converter recommendation) and hard work – we are extremely excited to present the PenaltyAI Search – the first and only AI system that identify compliance clauses in legislation on a global scale, extracts the actual penalties amount and serve it all to the user in US dollars.

Now risk and compliance professionals can search and identify risk levels across jurisdictions on a specific topic without even reading the law. Lets say that you are an IBM executive considering global expansion of your Watson services to new markets – with a click of a mouse you can now use the PenaltyAI Search feature of Global-Regulation to learn what would be the risk level of your goal.

Screenshot of PenaltyA Search for "tobacco nicotine"

Combine this with our complexity feature, suggested search ideas and related laws – and a risk & compliance team can feed Governance, Risk and Compliance (GRC) platforms with all the information needed to launch a new business line, in a matter of hours. Before, this would have taken months, require an army of translators and a division of analytics to determine risk and compliance.

We see this as a great achievement on several levels:

  1. an AI system that can really read legal text and produce useful meaning; and,
  2. enabling risk and compliance professionals to explore real and relevant data on a global scale, in English; and,
  3. allowing governments and businesses to assess and enhance their compliance efforts; and finally,
  4. for researchers to compare and contrast risk and compliance data globally.

Thank you big accounting firm for teaching us that even seemingly unsuccessful business meetings can bring great results. Thank you Microsoft Canada for your help in connecting us with the Microsoft Translator team. Thank you LegalX (now LawMade). Thank you Ken Thompson for UNIX and regular expressions. Thank you to my wife and children for your daily inspiration.

If you’d like to know more about how the system works technically, my CTO has written a blog post on building PenaltyAI Search.

Computers can now tell us about penalties for world laws.


Big Data With Purpose: How We Calculated the Fines of 1.55 Million Laws

This is a technical explanation of how we built our “PenaltyAI Search” service that combs 1.55 million world laws from 79 countries for fines. It can answer questions like “What would I pay for violating money laundering laws in Jamaica?” or “How much would a smuggler who warehouses stolen goods in China pay if they’re caught?“.

The penalties are extracted by an offline algorithm that runs on an Azure VM that does the following steps:

  1. Find laws that mention keywords associated with civil penalties (as a first pass)
  2. Convert all word numbers (like “one million”) into international number format (“1,000,000.00”)
  3. Identify the paragraphs that likely contain civil penalties based on words and numbers
  4. Merge several penalties into one, whether they related to the same “clause” (section) of a law
  5. Extract all the clauses and penalties
  6. Exclude certain classes of text that are almost never penalties but look like penalties (such as laws about gold coins and section references in laws that have to do with money)
  7. Recognize currencies in text, and combine this data with our table of national currencies, and convert penalties into USD using Yahoo! Finance rates (through the XML API call)
  8. Store the penalties and clauses in a MySQL database (RDS)

Screenshot of one of the MySQL tables for penalties

We then note in our search instance whether or not a law has penalties attached to it, so that the search instance can filter by laws that have penalties (as opposed to our regular search that includes laws that don’t have explicit fines attached to them). This process is run as a batch job offline because our 1.55 million+ laws takes several hours to process and no one would wait that long for their search results!

When a user does a search, the search is first sent to our Elasticsearch instance, and then the penalties are looked up from the MySQL database afterwards. This allows full-text search of laws to be combined with penalties, and in a way that results in much less strain on our relational database (because penalties are looked up by IDs rather than a JOIN). Storing the penalties separately allows us to reduce the amount of data in the in-memory search instance, and decouples our services (since we have other types of search like technical standards and law analytics).

The laws themselves are indexed, downloaded, converted to text, parsed, and converted to English, using our pipeline that runs on another Azure VM with RDS as the data store. We make extensive use of the Microsoft Translator API to convert foreign legislation to English (since most of the world’s laws are published in languages other than English). Our use of the service is actually listed on the “Customers” page for Microsoft Translator. We’ve written elsewhere on our blog about some of the ways we gather and process world legislation.


Graphing the World’s Laws: Visualization of 1.55 Million Laws + Our PenaltyAI Search

The graph above is the first time that penalties for non-compliance with the world’s laws has been visualized. It was made possible by the culmination of Global-Regulation Inc.’s R&D efforts over the last year to create an automated AI method for reading penalty provisions from civil laws – see the system here.

Our system (that we’re calling “PenaltyAI Search”) is now able to extract penalties from legislation (statutes and regulations) and present them in US dollars, along with the original text. This is a multi-phase process that starts with an AI based algorithm that identifies the penalty clauses. The next step is to extract the penalty amount from the penalty clause. This step includes complex linguistics mechanism that can convert amount in words into numbers like “one hundred thousand” to 100,000, and Indian English notation like “lakh” and “crore”. The next step is to convert different notation systems into a standardized decimal format (e.g. “560,99” to 560.99).The final step is converting all the world’s currency’s into USD to enable comparison on a global scale (which is done on an ongoing basis to account for currency fluctuations).

As for the graph at the top of this page, it was created by applying PenaltyAI Search to all of the laws in the database (currently around 1.55 million laws from 79 countries) and then excluding countries with only a small number of laws available or too few penalties to make any useful statistical inferences. We’re making available the Excel file for the graph here: World Penalties – Feb 9 2017. We’ve excluded any penalties other than those within the top twenty most frequent for each country in order to eliminate outliers.If you make any use of this data please link back to this blog post and let us know by pinging us on Twitter @globeregulation.

The PenaltyAI Search system has been implemented into the search engine and soon (within the next week) the user will be able to search, explore and drill down for a given topic, across jurisdictions or filtered by country. As usual, these features will be accompanied by our innovative visualization display.

We see this system as a ground breaking event in the field of extracting valuable information from legal text using algorthmic methods. On the theoretical level this is proof that the text of legislation can be mined for insights, and on the practical level, this is a celebratory milestone for compliance and GRC professionals that will be able to use our system to simplify their work.

Congratulations to our technical team that enabled us to go to where no legal tech product has gone before.

More updates will be available in the next edition of our newsletter and will be rolled out to subscribers shortly thereafter.


Microsoft Translator Case Study

After using MS machine translation (and some Google) to translate more than 750,000 laws and regulations from 26 languages, we are featured in a new MS Translator Case Study:


Search Ideas – Interaction with the Search Engine

What if you could discuss your search query with the search engine? well, now you can. Our new feature suggest search ideas based on the user’s query. These search ideas are extracted from our world’s laws database itself.

Here’s how it works:
1. We take the text of every law in the world and extract the most frequently mentioned word pairs, on a per-law basis. This way we create a new database of word pairs.
2. When someone does a search we check the database of word pairs and take the word pairs that occur most frequently in association with the word or word pair that the user is searching for. So a search for “coffee” will return keyword suggestions for words that appear in laws that mention “coffee” most commonly.
3. We then filter the words and take the best matches and display those to the user. These are the search ideas.

You can click on the search ideas in yellow at the top and it will be updated according to your recent search. For example, lets say you started with Coffee –> then you choose ‘Coffee Agreement’

And then choose ‘system certificates’. This is endless.

This new feature actually enable you to interact with the search engine and follow the trail that is based on the database of word pairs we created from our gigantic database of the world’s laws.



LexisNexis vs. Westlaw: How Many Countries Can You Search?

Which countries can be searched on global legal research platforms? According to our research, Westlaw (as of 2017) has legislation search for 14 countries (counting the EU as a country) and LexisNexis has 12 countries.

Westlaw (Thomson Reuters) and LexisNexis (RELX Group) are the two largest legal research companies in the world. Wolters Kluwer is a close third place, and in some jurisdictions is the main legal research company (they’re about 50/50 EU and North America). All of these companies offer a bewildering list of databases and sources, and none of them bundle multiple country search into one search engine. On one webpage, Westlaw claims that they make available 28,000 different databases worldwide.

According to our research, these are the countries for which LexisNexis has primary legal research search (i.e. national laws in a searchable format):

US, Canada, Australia, New Zealand, France, Ireland, India, UK, EU, South Africa, Hong Kong, Malaysia and Japan.

Westlaw (Thomson Reuters) offers the following countries:

Australia, New Zealand, Canada, UK, EU, USA, Philippines, Qatar, Iraq, UAE, Hong Kong, Argentina, Paraguay and Uruguay.

LexisNexis Sources:,,,,,, One Lexis page,, notes that there are nine countries in total but that doesn’t seem to be the case.

Westlaw Sources:,,,,,,,

A caveat to the above infographic: there may be countries that either one of these companies offers legislation search for that they either don’t advertise or is very difficult to discover. They essentially operate as independent businesses in many countries and have other subscription services that are sub-licensed, so there may be other flags missing from the infographic above. If you discover a missing country please let us know so we can update this blog post.


Teaching with Global-Regulation

screen-shot-2016-12-03-at-8-33-00-pmI had a wonderful experience with my 4th year Law & Technology students last Thursday. I asked them to search for privacy laws that relates to teenagers and then create a scenario that describes these rights in a way that teenagers can understand.
After creating the scenario, the students, working in groups, needed to choose the pictures for each square of the scenario and we uploaded it to a website I created for this purpose –

The results were amazing and the students were fascinated both by the legislation search in Global-
Regulation, and with creating the scenarios.
The best scenario was an illustration of legislation that is set to protect the privacy of teenagers by determining that a physician has a discretion to report a pregnancy of a girl under 16 to her parents if he feels that she is not capable of dealing with the sitscreen-shot-2016-12-03-at-8-32-22-pmuation.

Another scenario was describing new legislation in New Zealand that makes it an offence to engage in ‘revenge porn’.

Empowering teenager’s by informing them of their rights and obligations is an exciting field that should be fostered. Using Global-Regulation
for class exercise is really intriguing for the students.


The First Law Chatbot: GRBOT

Screenshot of chatbot on iOS
We are very excited to announce the launch of the first ever legislation chatbot: GRBOT, using the Kik mobile platform. The GRBOT enables the users to type a request: e.g., “search for United States drone laws” or “Show me EU laws about organic farming”, and the GRBOT will send the most relevant laws to the users’ mobile device, with a link to see more.

You can see the bot in the Kik Bot Shop here:

The only thing needed to connect to the GRBOT is to download the Kik app from the app store and friend the GRBOT account. When you first start chatting a message is displayed that shows a few examples of how to interact with the bot.

We think we’re the first ever legislation chatbot but there have been other legal bots created, both on chat platforms and accessible through other services.

Other legal bots already around are a – built by Cambridge students to advise sexual assault victims and, designed to analyze contracts. Others include DoNotPay, created by an 18-year old British coder, that quickly handles ticket appeals through a Q&A chat; and of course ROSS (but see skeptical voices). See also Lexi – “You can chat to Lexi to generate a free Privacy Policy or Non-Disclosure Agreement”. See also Fastcase bad law bot.

Screenshot of Kik Bot Shop


Machine Learning – Text Analytics Comparison

As part of our work engaging Artificial Intelligence and especially Machine learning into Global-Regulation‘s system, we’ve conducted a comparison between the big four providers of ML Text Analytics: Microsoft, Google, IBM and Amazon. This post is a follow up of a previous post regarding AI assisted compliance system.

MicrosoftMS ML studio allow some options of text analytics.


Although not particularly helpful for the purpose of identifying segments within legislation, MS ML studio

dn781358-mccaffreymls_fig1_hiresja-jpmsdn-10 is the most friendly system among the ML tools in this comparison. It is so friendly that even a user with minimal background in programming and ML can use it (with some patience and strong will 🙂

In MS ML There is a link to new text analytics models but unfortunately it is a broken link.

GoogleTensorflow offers some text analytics features. This is not a friendly tool and the text analytics options it does offer are vague. However, the vector representation of words may be useful when analyzing legal text and training a model to identify segments within legislation. This is a different approach than the structured text analytics offered by MS and IBM – see below.


In the context of a previous post about AI assisted compliance system, Tensorflow vector representation may be the solution for the first part of the challenge, i.e., manually identifying compliance clauses and training the model with these clauses. Nonetheless, new challenges arises in the implementation stage since the system will be able to identify laws that includes compliance clauses but not the specific clauses within the law.

Overcoming this challenge will require an additional stage in which the laws may be broken into chunks of text before running the model to identify the clauses. As laws are not always (and usually not) machine friendly, this process creates its own challenges.

IBM – Now offered through AlchemyLanguage, IBM now have one text analytics feature analyzing entities and relevance. Before migrating the text analytics features in July 2016, IBM offered few options of text analytics that are not available now.screen-shot-2016-11-11-at-10-20-11-am

This system analyze factor as ‘Fear’, ‘Anger’ and ‘Joy’ – not exactly what one would need to analyze legal text. In addition, IBM’s costumer service does not really work. Attempts to get access to their system failed even after stubborn emails.

Finally, it should be mentioned that Amazon’s ML platform  does not provide any text analytics options.


One would expect that the first step in analyzing legal text would be to use ML text analytics options. This seems like the short way towards identifying segments within legislation and the best way to ride the advancements in this field. However, upon testing these ML text analytics abilities, it becomes clear that this is not the answer and that in their present state of development, ML text analytics is not capable of doing much serious work, rather than classifying text as ‘Joy’ or ‘Anger’.

The more ‘simplified’ approach taken by Tensorflow vector representation is much more relevant for the purpose of analyzing legal text and identifying segments in big data even though it is far from the ‘Watson Dream’ where you ‘work with Watson’ and get your text analyzed with the click of the mouse.


Finding Foreign Laws in English

You can read translations of over 750,000 foreign laws using Just search for the the phrase and click a law from a non-English jurisdiction. A machine translation of the law will be shown on screen and you can click through to see the law in the original language.

ISO codes on the database coverage page

If you go to our coverage page you’ll see a list of our data sources. The bracketed codes to the right of the region title are the ISO codes for the language. The screenshot to the left shows a few examples of this. Note that “zh” is Mandarin, “es” is Spanish and “cs” is Czech.

As of mid-October, 2016 we have over 25 languages translated to English.