Your data is piling up but never used? You need data analysis and mining but your data is in free-text form? You are using data entry and manual process to structure and export your data? You need to organize and be able to search through your documents?
We work with free-form or semi-structured textual data. You have a variety of document formats, some are scans and images? No worries, we use state-of-the-art OCR and document transformation solutions.
You need to use, analyze, and make predictions based on structured data but all you have is free-form text? Our speciality is Information Extraction, the science of automatically extracting structured information.
We value common sense and efficiency above complicated research solutions. After analyzing your dataset and problem we will suggest the most efficient approach: rule-based, machine learning, or a combination of the two.
We offer one-time text processing, SaaS solutions with data model updates, or in-house software for private and sensitive data.
Some problems are easier than others. Before we embark on a solution we analyze your data and create a scientific performance estimation model. Statistics don't lie, you will know what to expect.
We use algorithmic document classification techniques and provide solutions for automatic document categorization for electronic discovery and routing, sentiment analysis, email routing, and spam filtering.
We provide various performance measures on a statistically representative sample of your data. Both rule-based and machine learning solutions are thoroughly documented.
Manual data cleansing, quality assurance, and training data generation are provided via managed crowdsourcing or in-house personnel.
We offer consulting services around small, big, and medium textual data. Our specialties are Natural Language Processing, Machine Learning, and Information Extraction. We are a team of NLP researchers (PhDs), experienced polyglot software engineers, data QAs.
Our infrastructure of NLP and ML tools allow us to quickly build both prototypes and production-ready applications. We have built solutions using established NLP frameworks, as well as a variety open-source and in-house ML algorithm implementations.
We are polyglots and have developed projects using Java, Python, Scala, and Clojure.
We are familiar with most available third party NLP solutions. When applicable, we evaluate performance of existing solutions (e.g. AlchemyAPI, Textalytcs, LingPipe).
We extract and clean text from a variety of document formats (e.g. PDF, HTML, RTF, Word, etc.). We have built an infrastructure for manual data cleansing and training data generation.