Your data is piling up but never used? You need data analysis and mining but your data is in free-text form? You are using data entry and manual process to structure and export your data? You need to organize and be able to search through your documents?
We work with free-form or semi-structured textual data. You have a variety of document formats, some are scans and images? No worries, we use state-of-the-art OCR and document transformation solutions.
You need to use, analyze, and make predictions based on structured data but all you have is free-form text? Our speciality is Information Extraction, the science of automatically extracting structured information.
We value common sense and efficiency above complicated research solutions. After analyzing your dataset and problem we will suggest the most efficient approach: rule-based, machine learning, or a combination of the two.
We offer one-time text processing, SaaS solutions with data model updates, or in-house software for private and sensitive data.
Some problems are easier than others. Before we embark on a solution we analyze your data and create a scientific performance estimation model. Statistics don't lie, you will know what to expect.
We use algorithmic document classification techniques and provide solutions for automatic document categorization for electronic discovery and routing, sentiment analysis, email routing, and spam filtering.
We provide various performance measures on a statistically representative sample of your data. Both rule-based and machine learning solutions are thoroughly documented.
Semi-automated data cleansing, quality assurance, and training data generation are provided via managed crowdsourcing or in-house personnel.
We offer consulting services around small, big, and medium textual data. Our specialties are Natural Language Processing, Machine Learning, and Information Extraction. We are a team of NLP researchers (PhDs), experienced polyglot software engineers, data QAs.
Our infrastructure of NLP and ML tools allow us to quickly build both prototypes and production-ready applications. We have built solutions using established NLP frameworks, as well as a variety open-source and in-house ML algorithm implementations.
We are full-stack developers and polyglots and have developed projects using Python, Java, Scala, Clojure, and C#.
We are familiar with most available third party NLP solutions and pre-trained ML models. When applicable, we evaluate performance of existing solutions.
We extract and clean text from a variety of document formats (e.g. PDF, HTML, RTF, Word, Images and Scans). We have built an infrastructure for semi-automated data cleansing and training data generation.
While the majority of our work is confidential, a few clients requested academic publications or patent applications:
• Combining structured and
free-text
electronic medical record data for real-time clinical decision support
•Towards reliable
ARDS
clinical decision support: ARDS patient analytics with free-text and structured EMR data
• Training and Prediction Data
Discrepancies: Challenges of Text Classification with Noisy, Historical Data
• Free-text and structured
clinical time series for patient outcome predictions
• Patient Context
Vectors:
Low Dimensional Representation of Patient Context Towards Enhanced Rule Engine Semantics and Machine
Learning
•
Disease specific ontology-guided rule engine and machine learning for enhanced critical care
decision support
• Toward Automated Early Sepsis
Alerting: Identifying Infection Patients from Nursing Notes
• Open Globe
Injury Patient Identification in Warfare Clinical Notes
• Combining Visual and Textual
Features for Information Extraction from Online Flyers