Prune Quantize Deploy Workshop

Prune, Quantize, Deploy
Efficient Edge AI from Model to Silicon
Visit: workshop.femto.ai
Pong Trairatvorakul · Director of Strategy · femtoAI
EDGE AI Taipei 2025
Prune, Quantize, Deploy Workshop
Visit workshop.femto.ai
EDGE AI Taipei 2025
Introduction
This hands-on workshop walks participants through the full edge AI deployment pipeline — from a Spoken Language Understanding model using a few MBs of memory in PyTorch to real-time inference on production hardware. Attendees will learn to compress models using pruning and quantization techniques, then deploy them on the femtoAI SPU-001 AI accelerator, a hardware platform specifically designed to exploit unstructured sparsity for ultra-eﬃcient inference.
By the end, you will have a working understanding of how to prepare, optimize, and deploy compressed neural networks on real hardware — and the confidence to apply these techniques in their own edge AI workflows.
This workshop is a condensed version of the E2E Example on the femtoAI Developer Portal.
Click here to visit the full version of the tutorial.
Prune, Quantize, Deploy Workshop
Visit workshop.femto.ai
EDGE AI Taipei 2025
2
Setup
Some of these steps may take a long time to download. To help with this a USB drive with the files is available during the workshop. Learn more below.here
Environment Setup
Download workshop code from here and unzip
Install Conda for Python environment management
Install run Docker if you don’t have it already
Setup
1. Open Terminal and change directory into downloaded zip folder
cd femtoAI_taipei_workshop
2. Create and activate Conda environment
conda create -n femtocrux_env python=3.10 
conda activate femtocrux_env
3. Install femtocrux (femtoAI compiler API) from PyPi
pip install femtocrux
4. Download compiler docker image
python download_compiler.py
You can obtain the compiler key from: developer.femto.ai (an account and NDA is required)
5. Run setup inside the repo
bash setup.sh
This installs the pip-requirements in requirements.txt, downloads the Google Speech Commands Dataset, and splits the dataset into Train, Test, and Validation sets. Step may take a few minutes to run.
Note: You may need to install wget. On a Mac, we suggest using using homebrew (brew install wget)
Installing from USB Drive
Loading the compiler docker image
1. Load back into docker
docker load -i /Volumes/FEMTO_USB/femtocrux_2.5.1.tar
2. Verify image is loaded:
docker images
Now when you run pip install femtocrux, the code will automatically use the local image instead of trying to fetch it remotely.
Loading the workshop code and 
Copy the femtoAI_taipei_workshop.zip file to your computer and unzip
Prune, Quantize, Deploy Workshop
Visit workshop.femto.ai
EDGE AI Taipei 2025
3
Training, Pruning, Quantization
Since these steps may take a long time to run and are sequential, we will be starting each step from a saved checkpoint (like a cooking show)
1. Train From Scratch
Change directory into src

cd src

Start training run

python3 train.py \
        globals.device=cpu \
        globals.batch_size=128 \
        globals.exp_tag=bce-max-fp \
        globals.epochs=2 \
        globals.prune_model=false \
        globals.quant_model=false

If you use a Mac with metal support, you can set global.device=mps You may need to set PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 if MPS runs out of memory

export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 
2. Pruning Train Run
python3 train.py \
            globals.device=cpu \
            globals.batch_size=128 \
            globals.exp_tag=bce-max-fp-prune \
            globals.epochs=2 \
            model_load.checkpoint_path=./assets/bce-max-fp/checkpoints/model_final.pt \
            model_load.model_type=fp \
            globals.prune_model=true \
            prune_cfg.prune_init=0.0 \
            prune_cfg.prune_target=0.8

If you use a Mac with metal support, you can set global.device=mps
3. Post Training Quantization
python3 train.py \
            globals.device=cpu \
            globals.batch_size=128 \
            globals.exp_tag=bce-max-prune-quantize \
            model_load.checkpoint_path=./assets/bce-max-fp-prune/checkpoints/model_final.pt \
            model_load.model_type=fp \
            globals.prune_model=true \
            prune_cfg.prune_init=0.8 \
            prune_cfg.prune_target=0.8 \
            globals.epochs=0 \
            globals.quant_model=true

If you use a Mac with metal support, you can set global.device=mps
4. Quantization Aware Training Run (QAT)
python3 train.py \
            globals.device=cpu \
            globals.batch_size=128 \
            globals.exp_tag=bce-max-prune-QAT \
            model_load.checkpoint_path=./assets/bce-max-prune-quantize/checkpoints/model_final.pt \
            model_load.model_type=quant \
            globals.prune_model=true prune_cfg.prune_init=0.8 \
            prune_cfg.prune_target=0.8 \
            globals.epochs=1 \
            globals.quant_model=true

This step is not compatible with globals.device=mps
Troubleshooting
If you run into an error about missing hid library:
brew install libusb hidapi
export DYLD_LIBARY_PATH=/opt/homebrew/lib
If you run into an error about numpy version:
pip install "numpy<2.0"
Prune, Quantize, Deploy Workshop
Visit workshop.femto.ai
EDGE AI Taipei 2025
4
Deploy
Load firmware onto the EVK
Locate the firmware .hex file at /src/assets/speech_command_firmware.hex
Download the Teensy 4.1 Firmware Loader tool for your OS from: https://www.pjrc.com/teensy/loader.html
Open the Loader Tool, select File > Open HEX File, and select the .hex file located in step 1a
Connect the EVK to your computer with the USB cable, turn it on, then press the PROGRAM FIRMWARE button located between the Audio In and Audio Out jacks
Complete the programming by selecting Operation > Program, then Operation > Reboot
Load model onto the EVK
Locate the model files (0PROG_A, 0PROG_D, 0PROG_P) from:
/src/assets/bce-max-prune-QAT/model_datas/e2e_example/io_records/apb_records
Load all three files (0PROG_A, 0PROG_D, 0PROG_P) onto the root directory of the SD card
Insert the SD card into the EVK
Test the deployed model
Turn on the EVK using the switch on the side
Wait until light flashes green
Say “Yes” and light should blink
Prune, Quantize, Deploy Workshop
Visit workshop.femto.ai
EDGE AI Taipei 2025
5
Continue Your Edge AI Journey
Deep Dive into the Code
The workshop offered a foundational look into Edge AI. Now, take your time to explore the code, experiment with different parameters, and deepen your understanding of femtoAI's capabilities at your own pace.
Experience Sparsity in Action
Head over to the femtoAI demo table for a hands-on experience. See real-world applications of our cutting-edge sparsity technology and witness the efficiency of our models firsthand.
Join the Developer Community
Sign Up for Free
Unlock full access to femtoAI's tools, documentation, and resources. Create your free developer account today to start building and deploying your own efficient Edge AI solutions.
Prune, Quantize, Deploy Workshop
Visit workshop.femto.ai
EDGE AI Taipei 2025
6