DotData extracts functionality from key data to make machine learning useful


Elevate your technology and enterprise data strategy to Transform 2021.

Many artificial intelligence experts claim that running the AI ​​algorithm is only part of the job. Preparing the data and cleaning it is a start, but the real challenge is knowing what to study and where to look for the answer. Is it hidden in the transaction book? Or maybe in the color pattern? Finding the right features for the AI ​​algorithm to look at often requires in-depth knowledge of the business itself so that the AI ​​algorithms are guided to search in the right place.

DotData wants to automate this work. The company wants to help businesses identify the best features for AI processing and find the best place to look for those features. The company launched DotData Py Lite, a containerized version of its machine learning toolkit that allows users to quickly create proof of concept (POC). Data owners looking for answers can either download the toolkit and run it locally, or run it in DotData’s cloud service.

VentureBeat spoke with DotData Founder and CEO Ryohei Fujimaki to discuss the new product and its role in the company’s larger approach to simplifying AI workloads for anyone with more data than time.

VentureBeat: Do you see your tool more as a database or an AI engine?

Ryohei Fujimaki: Our tool is no longer an AI engine, but it is [tightly integrated with] the data. There are three main stages of data in many businesses. First, there’s the Data Lake, which is mostly raw data. Then there is the data warehouse stage, which is somewhat cleaned up and architectured. It is in good condition, but it is not yet easily consumable. Then there is the data mart, which is a set of oriented and specific data tables. It is easily consumed by a business intelligence or machine learning algorithm.

We start working with data between the data lake and the data warehouse stage. [Then we prepare it] for machine learning algorithms. Our real core competence, our core ability, is to automate this process.

VentureBeat: The process of finding the right pieces of data in a vast sea?

Fujimaki: We think of it as ‘feature engineering’, which starts from the raw data, somewhere between the data lake and the data warehouse stage, doing a lot of data cleansing and feeding a machine learning algorithm.

VentureBeat: Machine learning helps find important features?

Fujimaki: Yes. Essentially, feature engineering is solving a machine learning problem based on domain expertise.

VentureBeat: Is it working well?

Fujimaki: One of our best customer case studies is from a subscription management company. There, the company uses its platform to manage customers. The problem is that there are a lot of refused or delayed transactions. It is a problem of almost $ 300 million for them.

Prior to DotData, they manually designed the 112 queries to create a feature set based on the original 14 columns of a table. Their accuracy was around 75%. But we took seven tables from their dataset and discovered 122,000 feature models. Accuracy has jumped to over 90%.

VentureBeat: So the manually discovered features were good, but your machine learning found a thousand times more features and the accuracy jumped?

Fujimaki: Yes. This precision is only a technical improvement. In the end, they could avoid nearly 35% of bad transactions. It’s almost $ 100 million.

We went from 14 different columns in one table to almost 300 columns in seven tables. Our platform will identify the most promising and significant feature models, and by using our important features they could improve accuracy very substantially.

VentureBeat: So what kind of features does he discover?

Fujimaki: Let’s take a look at another product demand forecasting case study. The features discovered are very, very simple. Machine learning uses time aggregation from tables of transactions, such as sales, over the last 14 days. Obviously, this is something that could affect the demand for products next week. For sales or housewares, the machine learning algorithm found that a 28-day window was the best predictor.

VentureBeat: is it a single window?

Fujimaki: Our engine can automatically detect specific sales trends for a household item. This is called a partial or annual periodic model. The algorithm will detect annual periodic patterns that are particularly important for a seasonal event effect like Christmas or Thanksgiving. In this use case, there is a lot of payment history, a very attractive history.

VentureBeat: How hard is it to find good data?

Fujimaki: There are often a lot of them, but it is not always good. Some industrial customers are studying their supply chains. I love this case study of a manufacturing company. They analyze sensor data using DotData, and there are a lot of them. They want to detect certain failure patterns or try to maximize the efficiency of the manufacturing process. We support them by deploying our flow prediction engine on the [internet of things] factory sensors.

VentureBeat: Your tool prevents humans from searching and trying to imagine all of these combinations. It should facilitate the practice of data science.

Fujimaki: Traditionally, this type of feature engineering required a lot of data engineering skills because the data is very large and there are many combinations.

Most of our users are not data scientists today. There are a few profiles. One is like a [business intelligence] type of user. Like a visualization expert who builds a dashboard for descriptive analytics and wants to switch to predictive analytics.

Another is a data engineer or systems engineer who is familiar with this type of data model concept. System engineers can easily understand and use our tool to do machine learning and AI. There is a growing interest in data scientists themselves, but our main product is primarily useful for these types of people.

VentureBeat: Do you automate the discovery process?

Fujimaki: Basically our customers are very, very surprised when we have shown that we automate this feature extraction. This is the most complex and the longest part. Usually people said that it is impossible to automate because it requires a lot of knowledge in the field. But we can automate that part. We can automate the process before machine learning to manipulate the data.

VentureBeat: So it’s not just about finding the best features, but the work above. The work of identifying the characteristics themselves.

Fujimaki: Yes! We use AI to generate AI input. There are a lot of players who can automate final machine learning. Most of our customers have chosen DotData because we can automate the search for features first. This part is kind of our secret sauce, and we’re very proud of it.


VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the topics that interest you
  • our newsletters
  • Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
  • networking features, and more

Become a member


About Author

Comments are closed.