Machine Learning – Text Analytics Comparison

As part of our work engaging Artificial Intelligence and especially Machine learning into Global-Regulation‘s system, we’ve conducted a comparison between the big four providers of ML Text Analytics: Microsoft, Google, IBM and Amazon. This post is a follow up of a previous post regarding AI assisted compliance system.

MicrosoftMS ML studio allow some options of text analytics.

screen-shot-2016-11-11-at-9-50-29-am

Although not particularly helpful for the purpose of identifying segments within legislation, MS ML studio

dn781358-mccaffreymls_fig1_hiresja-jpmsdn-10 is the most friendly system among the ML tools in this comparison. It is so friendly that even a user with minimal background in programming and ML can use it (with some patience and strong will 🙂

In MS ML There is a link to new text analytics models but unfortunately it is a broken link.

GoogleTensorflow offers some text analytics features. This is not a friendly tool and the text analytics options it does offer are vague. However, the vector representation of words may be useful when analyzing legal text and training a model to identify segments within legislation. This is a different approach than the structured text analytics offered by MS and IBM – see below.

screen-shot-2016-11-11-at-10-08-39-am

In the context of a previous post about AI assisted compliance system, Tensorflow vector representation may be the solution for the first part of the challenge, i.e., manually identifying compliance clauses and training the model with these clauses. Nonetheless, new challenges arises in the implementation stage since the system will be able to identify laws that includes compliance clauses but not the specific clauses within the law.

Overcoming this challenge will require an additional stage in which the laws may be broken into chunks of text before running the model to identify the clauses. As laws are not always (and usually not) machine friendly, this process creates its own challenges.

IBM – Now offered through AlchemyLanguage, IBM now have one text analytics feature analyzing entities and relevance. Before migrating the text analytics features in July 2016, IBM offered few options of text analytics that are not available now.screen-shot-2016-11-11-at-10-20-11-am

This system analyze factor as ‘Fear’, ‘Anger’ and ‘Joy’ – not exactly what one would need to analyze legal text. In addition, IBM’s costumer service does not really work. Attempts to get access to their system failed even after stubborn emails.

Finally, it should be mentioned that Amazon’s ML platform  does not provide any text analytics options.

Conclusion

One would expect that the first step in analyzing legal text would be to use ML text analytics options. This seems like the short way towards identifying segments within legislation and the best way to ride the advancements in this field. However, upon testing these ML text analytics abilities, it becomes clear that this is not the answer and that in their present state of development, ML text analytics is not capable of doing much serious work, rather than classifying text as ‘Joy’ or ‘Anger’.

The more ‘simplified’ approach taken by Tensorflow vector representation is much more relevant for the purpose of analyzing legal text and identifying segments in big data even though it is far from the ‘Watson Dream’ where you ‘work with Watson’ and get your text analyzed with the click of the mouse.

SHARE THIS POST ON SOCIAL MEDIA

Top Word Pairs in Global Laws

As part of our efforts to graph the world’s laws (see previous blog posts), we created a list of the top word pairs used in global laws. Below is the top 25 word pairs. You can download the full Excel file here (contains the top 499 word pairs).

Word Pair Percentage Of Top Word Pairs No# Occurrences Countries With Word Pair as Top Word Pair
order filed 3.895957924 11473490 United States
statutory authority 3.0114618 8868673 Antigua and Barbuda, Jamaica, Singapore, Trinidad and Tobago, United States
filed effective 2.532391836 7457825 United States
repealed order 0.508982635 1498940 United States
later promulgation 0.466916032 1375055 United States
final rule 0.418250047 1231735 United States
social security 0.378933894 1115950 Antigua and Barbuda, Australia, Brazil, Canada, Chile, China, Czech Republic, Denmark, European Union, Finland, France, Germany, Indonesia, Italy, Jersey, Kenya, Korea, Mexico, New Zealand, Philippines, Poland, Russia, San Marino, South Africa, Spain, Sri Lanka, Sweden, Switzerland, Tonga, Trinidad and Tobago, Turkey, United Kingdom, United States
royal decree 0.363138502 1069433 Denmark, Italy, Saudi Arabia, Spain
legislative decree 0.355003282 1045475 Brazil, Denmark, Germany, Italy, Spain
laid down 0.311219516 916533 Bangladesh, Brazil, Chile, Czech Republic, Denmark, Estonia, European Union, Finland, France, Germany, India, Ireland, Italy, Jamaica, Jersey, Mexico, Pakistan, Poland, San Marino, Singapore, Spain, Sweden, Switzerland, Tonga, Turkey, United Kingdom
secretary state 0.309433082 911272 Brazil, Denmark, Finland, France, Italy, Jersey, San Marino, Spain, Turkey, United Kingdom, United States
member states 0.299539612 882136 Antigua and Barbuda, Brazil, Czech Republic, Denmark, Estonia, European Union, Finland, France, Germany, Ireland, Italy, Jamaica, Jersey, Kenya, Poland, San Marino, Spain, Sweden, Switzerland, Trinidad and Tobago, Turkey, United Kingdom
member state 0.239937702 706610 Antigua and Barbuda, Czech Republic, Denmark, Estonia, European Union, Finland, France, Germany, India, Ireland, Italy, Jamaica, Jersey, Kenya, Poland, San Marino, Spain, Sweden, Switzerland, Trinidad and Tobago, Turkey, United Kingdom
ministerial decree 0.229651699 676318 Indonesia, Italy
executive order 0.224945034 662457 Denmark, Philippines, United States
northern ireland 0.216335447 637102 Czech Republic, European Union, Jersey, Sweden, Switzerland, United Kingdom
from date 0.193836427 570843 Antigua and Barbuda, Australia, Bangladesh, Bermuda, Botswana, Brazil, Canada, Chile, China, Czech Republic, Denmark, Estonia, European Union, Finland, France, Germany, India, Indonesia, Ireland, Italy, Jamaica, Japan, Jersey, Kenya, Korea, Mexico, New Zealand, Pakistan, Philippines, Poland, Russia, San Marino, Singapore, South Africa, Spain, Sri Lanka, Sweden, Switzerland, Tonga, Trinidad and Tobago, Turkey, United Kingdom, United States, Vietnam, Zambia
russian federation 0.189857104 559124 European Union, Russia, Switzerland, Turkey
authority chapter 0.186990184 550681 United States
european parliament 0.183776911 541218 Czech Republic, Denmark, Estonia, European Union, Finland, France, Germany, Ireland, Italy, Poland, Spain, Sweden, Switzerland, United Kingdom
local government 0.169821604 500120 Antigua and Barbuda, Australia, Bangladesh, Botswana, China, Czech Republic, Denmark, Estonia, Finland, Indonesia, Ireland, Japan, Kenya, Korea, Mexico, New Zealand, Pakistan, Philippines, Poland, Russia, South Africa, Spain, Sri Lanka, Sweden, Turkey, United Kingdom, United States, Zambia
published official 0.165419866 487157 Bangladesh, Brazil, Chile, European Union, France, India, Italy, Mexico, Pakistan, Spain, Turkey
health care 0.159788235 470572 Australia, Botswana, Brazil, Canada, Chile, China, Czech Republic, Denmark, Estonia, Finland, France, Germany, Indonesia, Italy, Jersey, Korea, Mexico, Philippines, Poland, Russia, San Marino, Singapore, South Africa, Spain, Sweden, Tonga, Turkey, United Kingdom, United States
public service 0.155587517 458201 Antigua and Barbuda, Australia, Bangladesh, Bermuda, Botswana, Brazil, Canada, Chile, China, Czech Republic, Denmark, Estonia, European Union, Finland, France, Germany, India, Indonesia, Ireland, Italy, Jamaica, Japan, Jersey, Kenya, Korea, Mexico, Pakistan, Philippines, Russia, San Marino, Singapore, South Africa, Spain, Sri Lanka, Sweden, Switzerland, Tonga, Trinidad and Tobago, Turkey, United Kingdom, United States, Zambia
effective date 0.1554921 457920 Antigua and Barbuda, Australia, Bermuda, Canada, Czech Republic, Finland, Ireland, Italy, Japan, Jersey, Kenya, New Zealand, Pakistan, Singapore, South Africa, Spain, Switzerland, Trinidad and Tobago, Turkey, United Kingdom, United States, Vietnam, Zambia
SHARE THIS POST ON SOCIAL MEDIA

How To Find World Laws

The majority of legal research is conducted within a single jurisdiction. But in an increasingly global world there’s sometimes a need for legal research that crosses borders. How can searches be conducted across several countries or even the world?

Global-Regulation’s search engine is one answer. It can be used to search for the majority of the digitally published laws (translated into English) for almost 50 countries worldwide. The search engine allows for comparison across jurisdictions and filtering by country/date.

Alternative to Global-Regulation include:

  • Searching each national legislation portal
  • Using products from LexisNexis and WestLaw that have cross-jurisdiction databases
  • For some regions, like southern Africa (SAFLII) and Micronesia (PacLII), there are free legal services that aggregate the laws.
SHARE THIS POST ON SOCIAL MEDIA

San Marino: New Source (Italian)

San Marino legislation is now searchable through Global-Regulation.com. The laws are converted to PDF and then translated from Italian into English.

Here’s an example search for “stamp taxes”: https://www.global-regulation.com/search.php?year&country=San+Marino&province&start&q=stamp+taxes&advanced=false. San Marino has quite a few laws about stamps and is a favoured country of stamp collectors.

Here’s another example for banks: https://www.global-regulation.com/search.php?year&country=San+Marino&province&start&q=bank&advanced=false.

As with all Global-Regulation searches, clicking the search result sends the user to the official publication.

SHARE THIS POST ON SOCIAL MEDIA