This project implements a complete intent parsing system for technician-style instructions in industrial and microgrid environments. It takes raw natural language like:
“check the inverter temperature and update the power limit to 20%”
…and outputs structured actions:
{
"intent": "update_parameter",
"target": "inverter",
"parameter": {
"name": "power_limit",
"value": "20%"
}
}The system benchmarks three NLP modeling families:
- TF-IDF + Linear SVM — baseline intent classification
- LSTM / BiLSTM — multi-output classification (intent + target + parameter)
- DistilBERT Token Classifier — end-to-end structured extraction
The goal: build a clean, production-style pipeline that demonstrates how traditional ML, classical deep learning, and modern transformers differ in capability and performance.
Because real technician logs are private and inconsistent, the project builds a controlled, balanced synthetic dataset covering:
- 10+ intents
- 15+ equipment targets
- 20+ parameter types
- Numeric, categorical, and percentage values
Flexible enough to extend or adapt to real operational logs later.
Each model solves the same set of tasks:
- Intent classification
- Target identification
- Parameter extraction
This allows for direct comparison between:
| Model | Strengths | Weaknesses |
|---|---|---|
| TF-IDF + SVM | Fast, simple, solid baseline | No structured extraction |
| LSTM / BiLSTM | Multi-output, learns patterns | Needs hand-engineered preprocessing |
| DistilBERT | Best overall generalisation; robust extraction | Heavier, slower on CPU |
A final pipeline demonstrates how raw text becomes structured output:
- Tokenisation
- Transformer inference
- Slot grouping
- Value extraction
- JSON-like final output
This is the most production-like part of the system and the highlight of the project.
The notebook is intentionally structured so that:
- Students can follow it
- Recruiters can evaluate it quickly
- Engineers can adapt it to production
Sections include:
- Imports & Setup
- Dataset Generation
- Preprocessing
- TF-IDF Baseline
- LSTM / BiLSTM Models
- DistilBERT Model
- Unified Evaluation
- End-to-End Demo
- Model Comparison Summary
Raw Instruction
↓
Preprocessing
↓
Three Model Families
↓
Unified Evaluation Layer
↓
End-to-End Parser Demo
↓
Structured JSON Output
Run the final cell in the notebook:
parse_instruction("reset the inverter frequency to 50hz")Output:
{
"intent": "reset",
"target": "inverter",
"parameter": {
"name": "frequency",
"value": "50hz"
}
}While scores vary depending on the exact synthetic dataset, trends are consistent:
- TF-IDF + SVM performs well for intent only
- BiLSTM improves multi-output performance
- DistilBERT dominates structured extraction
Transformers are recommended if you want:
- robustness to grammar changes
- better understanding of technician lingo
- high accuracy on parameter extraction
pip install -r requirements.txtOr install notebook dependencies manually:
transformers
tensorflow
torch
pandas
numpy
scikit-learn
tqdm
matplotlib
.
├── e2e_intent_parser.ipynb
├── README.md
└── data/
Industrial environments need interpretable AI — not black boxes.
Technicians speak in short, imperative commands with variable structure. This project shows how to build an AI system that can:
- parse real operator instructions
- extract actionable parameters
- integrate into predictive maintenance and control systems
- adapt across equipment types
It’s both a portfolio piece and a template for real deployments.
Planned improvements:
- Add CRF layer on top of DistilBERT
- Add error-recovery heuristics for incomplete queries
- Export a standalone Python library (
intent_parser/) - Production API (FastAPI + lightweight ONNX model)