Update ML

cms-opendata-workshop · Jul 19, 2024 · 7e31961 · 7e31961
1 parent b23ea66
commit 7e31961
Show file tree

Hide file tree

Showing 5 changed files with 50 additions and 17 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/episodes/03-ml-1.md b/episodes/03-ml-1.md
@@ -35,10 +35,27 @@ Before diving into ML in HEP, participants should have a basic understanding of:
 - Data handling and visualization
 - Elementary statistical concepts (mean, variance, etc.)
 
-### Detailed Concepts and Steps
+## Let's get the basics clear 
+
+[Machine learning](https://www.ibm.com/topics/machine-learning) (ML) is a branch of artificial intelligence (AI) and computer science that focuses on the using data and algorithms to enable AI to imitate the way that humans learn, gradually improving its accuracy. If that is not clear, please watch [this video](https://www.youtube.com/watch?v=4RixMPF4xis).
+
+![AI vs ML](ml-vs-ai.png)
+
+::::::::::::::::::::::::::::::::::::: callout 
+
+Machine learning, deep learning, and neural networks are all sub-fields of artificial intelligence. However, neural networks is actually a sub-field of machine learning, and deep learning is a sub-field of neural networks.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+
+### Nerual Networks
+
+To have an overview of neural networks, visit [3Blue1Brown's basics of neural networks, and the math behind how they learn](https://www.3blue1brown.com/lessons/neural-networks).
 
 ## Data Acquisition and Understanding
 
+By now we must have a basic understanding of how Machine Learning functions, to use this in the realm of High Energy Physics, we must have the following basics.
+
 ### CMS Open Data Overview
 - Accessing and understanding the CMS Open Data repository.
 - Types of datasets available (e.g., AOD, MiniAOD, NanoAOD) and their differences.
@@ -55,29 +72,33 @@ As you dive into the hackathon, keep in mind that feature engineering—like sel
 
 ![supervised vs unsupervised learning](../fig/s-vs-us.png)
 
+You can get a glimpse of the differences in [this video](https://www.youtube.com/watch?v=rHeaoaiBM6Y).
+
 ## Supervised Learning in HEP
 
 ### Basics of Supervised Learning
 - Understanding labeled datasets and target variables.
 - Classification tasks: distinguishing particle types (e.g., muons, electrons).
-- Regression tasks: predicting particle properties (e.g., energy, momentum).
+- Regression tasks: a possible application in HEP can be predicting particle properties (e.g., energy, momentum).
 
 ### Model Selection and Training
-- Choosing appropriate algorithms (e.g., Decision Trees, Random Forests, Neural Networks).
-- Cross-validation techniques to optimize model performance.
-- Hyperparameter tuning to fine-tune model behavior.
+- [Choosing appropriate algorithms](https://www.simplilearn.com/10-algorithms-machine-learning-engineers-need-to-know-article) (e.g., Decision Trees, Random Forests, Neural Networks).
+- [Cross-validation techniques](https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations) to optimize model performance.
+- [Hyperparameter tuning](https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview) to fine-tune model behavior.
 
 ### Model Evaluation
-- Metrics: accuracy, precision, recall, F1-score.
-- Confusion matrices and ROC curves for performance visualization.
-- Interpreting results and refining models based on feedback.
+- [Metrics](https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234): accuracy, precision, recall, F1-score.
+- Confusion matrices and [ROC curves](https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc) for performance visualization.
+- Interpreting results and refining models based on feedback: Watch [this video](https://www.youtube.com/watch?v=nt5DwCuYY5c&t) for Learning Curves In Machine Learning explanation.
+
+![Confusion metrics](../fig/metrics.png)
 
 ## Unsupervised Learning in HEP
 
 ### Basics of Unsupervised Learning
-- Clustering algorithms (K-means, DBSCAN) for grouping similar events.
+- [Clustering algorithms](https://cloud.google.com/discover/what-is-unsupervised-learning?hl=en#) (K-means, DBSCAN) for grouping similar events.
 - Anomaly detection techniques to identify unusual data points.
-- Dimensionality reduction methods (PCA, LDA) for visualizing high-dimensional data.
+- [Dimensionality reduction](https://www.ibm.com/topics/dimensionality-reduction) methods (PCA, LDA) for visualizing high-dimensional data.
 
 ### Applications in Particle Physics
 - Discovering new particles through anomaly detection.

diff --git a/episodes/04-ml-2.md b/episodes/04-ml-2.md
@@ -20,11 +20,11 @@ exercises: 0
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
-## Practical Application of Machine Learning in Particle Physics
+# Practical Application of Machine Learning in Particle Physics
 
 Machine learning techniques, such as Convolutional Neural Networks (CNNs) and autoencoders, play pivotal roles in analyzing particle physics data. This section provides insights into their architectures, training processes, and practical applications within the field.
 
-### Convolutional Neural Networks (CNNs)
+## Convolutional Neural Networks (CNNs)
 
 #### Purpose and Architecture
 
@@ -35,7 +35,13 @@ CNNs are specialized neural networks designed for processing grid-like data, suc
 - **Training**: CNNs learn through backpropagation, adjusting weights to minimize classification error or regression loss.
 - **Applications**: In particle physics, CNNs are used to classify particle types, analyze detector images for anomalies, and segment regions of interest in collision data.
 
-### Autoencoders
+#### Practical Example
+
+This project explores the application of deep learning techniques in high-energy physics using data from the CMS experiment at the LHC. The repository includes analyzers, scripts, and notebooks to process collision data and train convolutional neural networks (CNNs) for particle classification. By transforming collision data into images and using various CNN architectures, the project aims to classify high-energy particle collision outcomes with high accuracy. The `poet_realdata.py` script and `MuonAnalyzer_realdata.cc` analyzer are based on the original configuration and analyzers used in the final version of the [CMS Open Data Workshop 2022](https://cms-opendata-workshop.github.io/2023-07-11-cms-open-data-workshop/), ensuring consistency and relevance of the data analysis techniques and tools employed in this project with those taught during the workshop.
+
+Link to repo: [GitHub](https://github.com/xaviertintin/cnn-hep-thesis/tree/main)
+
+## Autoencoders
 
 #### Purpose and Architecture
 
@@ -46,7 +52,13 @@ Autoencoders are unsupervised learning models that learn efficient data represen
 - **Training**: Autoencoders minimize reconstruction error during training, optimizing parameters to improve data reconstruction quality.
 - **Applications**: In particle physics, autoencoders are used to denoise detector data, detect rare events or anomalies in experimental data, and extract meaningful features for subsequent analysis.
 
-### Key Differences
+#### Practical Example
+
+The QCD School 2024 ML project is an educational initiative designed to introduce participants to the application of machine learning in high energy physics, specifically through anomaly detection using unsupervised learning. The project provides a hands-on tutorial for designing and implementing a tiny autoencoder (AE) model, which is trained to identify potentially new physics events from proton collision data obtained from the CMS Open Data. In this example you will learn to compress and decompress data using the autoencoder, train it on background data, and evaluate its performance on both background and New Physics simulated samples. The project also covers advanced techniques like quantization-aware training using QKeras and model deployment on FPGA firmware with hls4ml, providing a comprehensive learning experience that bridges theoretical concepts with practical implementation.
+
+Link to repo: [GitHub](https://github.com/thaarres/qcd_school_ml/)
+
+## Key Differences
 
 - **Supervised vs. Unsupervised**: CNNs require labeled data for training (supervised), while autoencoders learn from unlabeled data (unsupervised).
 - **Output**: CNNs produce predictions based on input data labels (classification/regression), whereas autoencoders reconstruct input data or learn compressed representations.
@@ -58,8 +70,8 @@ Understanding these machine learning techniques equips researchers with powerful
 
 ::::::::::::::::::::::::::::::::::::: keypoints 
 
-- Introduction to machine learning applications in particle physics.
-- Detailed exploration of CNNs and autoencoders architectures.
-- Practical insights into training and deploying ML models in HEP.
+- **Supervised vs. Unsupervised:** CNNs require labeled data for training, making them suited for supervised learning tasks where the model learns from explicit examples with known outcomes. Autoencoders, in contrast, utilize unlabeled data and excel in unsupervised learning, focusing on learning data representations and detecting anomalies without predefined labels.
+- **Output:** CNNs produce predictions or classifications based on input data labels (classification/regression), whereas autoencoders aim to reconstruct input data or generate compressed representations for further analysis.
+- **Use Cases:** CNNs are ideal for tasks involving structured data such as detector images, where precise classification or segmentation is needed. Autoencoders are particularly useful for exploratory tasks, anomaly detection, and dimensionality reduction in complex datasets where direct supervision is not available.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/fig/metrics.png b/episodes/fig/metrics.png
diff --git a/episodes/fig/ml-vs-ai.png b/episodes/fig/ml-vs-ai.png