Tuesday, September 22, 2020

Data Science Study Notes: Automatic Machine Learning (AutoML)

Automated Machine Learning (AutoML) is one of the hottest topics in data science today, but what does it mean?

Why the random search is much more faster than the grid search, and at the same time, not missing the optimal point. Here is the beautiful formula we need to achieve:
where is the formula coming from?
So you can apply the typical condition, 95% of chance falling in the optimal region, then we know as long as we random sample 60 times, we have 95% of chance to land the optimal region:
Here is the grach to understand why the random search is likely to achieve the optimal region with much less time of trails? It's essentially due to the fact that there are usually not that many important factors for the model, in other words, only a few important factors that worthy to grid search, all the other searches are essentially a waste for the un-important factors:
here is the video from Danny Leybzon. He has an academic background in computational statistics, is a grandmaster in AutoML.

No comments:

Post a Comment

Python Study notes: How to run Scala and Spark in the Jupyter notebook

Here we provide some tutorial of running scala in jupyter notebook. It can also be used for scala development with the spylon-kernel. This ...