Our evolving document understanding technology is based on open-source software and curated datasets that are proprietary, creative commons, or in the public domain. Language models pre-trained on web-scale data (Common Crawl, WebText) are fine-tuned to legal, financial, regulatory and client domain documents to achieve high task accuracies.

Open-Source Software 

Artificial intelligence and natural language processing applications became commercially practical only recently, in the past decade, with advances in deep learning and the availability of the data and compute resources that deep learning demands.

The following are machine learning, big data, NLP, and deep learning development libraries. We work mostly with Python and Java-based libraries.


Large open source, creative commons, or public domain datasets are key to the training and development of AI/ML/NLP applications.


Regulations are issued by governments, industry or standards bodies. Below we have European Union and United States government regulations, in the public domain, in finance, telecommunications and energy.

Explainability, bias and societal issues

We carefully monitor and evaluate issues of explainability, bias, fairness, future of work and other societal consequences that the development and use of artificial intelligence poses.


Development of artificial intelligence and natural language processing applications requires teams with the business experience, AI/ML/NLP skillset and familiarity with the resources available and issues posed. Andinum is a partner with the expertise necessary to take advantage of the opportunity.  

