Tuesday, September 22, 2020

Data Science Study Notes: Automatic Machine Learning (AutoML)

Danny Hello OC Advanced Analytics and Big Data Meetupers, After taking some time off in the summer (hope you had a great time with the family and friends), we're back to more awesome learning. We'll have 3 sessions, and 2 of them will be occurring simultaneously b/c one will be technical and targeted to those with advanced data science knowledge, while the other one will be far less technical and will serve as an introduction to data science. The first session will be for all attendees: We'll be learning about AutoML, a trending feature of Machine Learning that is making it easier for any of us to more easily and successively create good data science models. Automated Machine Learning (AutoML) is one of the hottest topics in data science today, but what does it mean? In this workshop, Danny D. Leybzon (a seasoned data scientist and Solutions Architect at Qubole) will give a broad overview of AutoML, ranging from simple hyperparameter optimization all the way to full pipeline automation. After going over the theoretical framework and explanation of AutoML, he will dive into concrete examples of different types of AutoML. Throughout the presentation, Danny will leverage Apache Spark (a framework popular with data scientists who need to scale their machine learning workloads to Big Data) and Apache Zeppelin notebooks, as well as popular Python libraries such as Pandas, Plotly, and bayes-opt. Participants will walk away from this workshop with in depth knowledge of hyperparameter tuning (using grid search, random search, Bayesian optimization, and genetic algorithms) and will have been exposed to new tools for automating their machine learning workflows. About our sponsor, Qubole: Qubole delivers an autonomous, Self-Service Platform for Big Data Analytics built on Amazon Web Services, Microsoft and Google Clouds. Qubole was started by the team that built and ran Facebook's Data Service when they founded and authored Apache Hive. With Qubole, a data scientist can now spin up hundreds of clusters on their public cloud of choice and begin creating ad hoc and/or batch queries in 3 minutes. Qubole is used by many leading firms for end-to-end data processing, and takes away the burdens of scalability and administration. About our speaker, Danny: Danny has an academic background in computational statistics. He believes that good data science requires good data engineering in order to create clean, accurate, and accessible data for data scientists. In the past, he’s given presentations on distributed deep learning, productionizing machine-learning models, and the importance of big data for machine learning in the modern world. The following session is for those with advanced technical data science knowledge: Bayesian Inference in Machine Learning This talk introduces Bayesian inference and its use cases in data science. Markov Chain Monte Carlo (MCMC) will be discussed as a technique for evaluating analytically intractable integrals, a limitation when implementing Bayesian inference. The flow of the talk will be an introduction (what it is), followed by how the MCMC works, ending with when it's useful. The focus will be on concepts vs. the math.

No comments:

Post a Comment

Python Study Notes: how to load stock data, manipulate data, find patterns for profit?

#================================================ from pandas_datareader import data as pdr #run the upgrade if see error: pandas_dataread...