Automatic Code Evaluation System Using CodeBERT and Token-based Similarity
A comprehensive research project for evaluating code similarity using state-of-the-art deep learning (CodeBERT) combined with traditional token-based analysis methods.
- Overview
- Features
- Architecture
- Installation
- Usage
- Research Methodology
- Project Structure
- API Reference
- Contributing
- License
Code Inspector is an automated code evaluation system designed to assess the similarity between student submissions and reference implementations. It combines:
- CodeBERT Analysis: Deep learning-based semantic understanding of code
- Token Similarity: Traditional token-based comparison for structural analysis
- Combined Scoring: Weighted combination of both methods for accurate evaluation
This system aims to measure the accuracy of combining CodeBERT and token-based similarity methods in evaluating functional equivalence of code without executing it.
-
Dual Evaluation Methods
- CodeBERT semantic embeddings
- Token-based similarity (Jaccard, Dice coefficients)
-
Multiple Input Sources
- Direct code input
- File upload
- GitHub repository analysis
-
Comprehensive Reporting
- HTML reports with visualizations
- JSON exports for further analysis
- Text reports for documentation
-
Web Interface
- User-friendly Flask-based UI
- Real-time evaluation
- Batch processing support
-
Accuracy Measurement
- MAE, RMSE, RΒ² metrics
- Classification accuracy
- Error analysis and visualization
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Code Inspector β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Input Layer β
β ββ GitHub Repository Manager β
β ββ File Upload Handler β
β ββ Code Preprocessor β
β β
β Evaluation Layer β
β ββ CodeBERT Evaluator β
β β ββ Model: microsoft/codebert-base β
β β ββ Embedding Generation β
β β ββ Cosine Similarity β
β β β
β ββ Token Similarity Evaluator β
β ββ Tokenization β
β ββ Identifier Extraction β
β ββ Jaccard/Dice Similarity β
β β
β Combination Layer β
β ββ Score Combiner β
β ββ Weighted Average β
β ββ Pass/Fail Decision β
β ββ Recommendations β
β β
β Output Layer β
β ββ Report Generator (HTML/JSON/Text) β
β ββ Accuracy Calculator β
β ββ Visualization β
β β
β Interface Layer β
β ββ Flask Web Application β
β ββ Command-line Interface β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Python 3.8 or higher
- pip package manager
- Git (for GitHub integration)
- (Optional) CUDA-capable GPU for faster CodeBERT inference
git clone https://github.com/yourusername/CodeInspector.git
cd CodeInspector# Windows
python -m venv venv
venv\Scripts\activate
# Linux/Mac
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtNote: First installation may take several minutes as it downloads the CodeBERT model (~500MB).
# Test with sample data
python main.py --student-code samples/student_code_high_similarity.py --reference-code samples/reference_code.py-
Start the web server:
python app.py
-
Open your browser and navigate to:
http://localhost:5000 -
Upload or paste code and click "Evaluate Code"
python main.py \
--student-code path/to/student.py \
--reference-code path/to/reference.py \
--requirements path/to/requirements.txt \
--language python \
--output-dir reports \
--format htmlpython main.py \
--github-url https://github.com/student/project \
--reference-code path/to/reference.py \
--language pythonfrom main import CodeInspector
# Initialize
inspector = CodeInspector(
codebert_weight=0.6,
token_weight=0.4,
pass_threshold=0.7
)
# Evaluate code
results = inspector.evaluate_code(
student_code="def add(a, b): return a + b",
reference_code="def add(x, y): return x + y",
language='python'
)
# Generate report
inspector.generate_report(
results,
student_info={'name': 'John Doe'},
output_format='html'
)-
Preprocessing
- Code normalization
- Comment removal (configurable)
- Whitespace standardization
-
CodeBERT Analysis
- Convert code to embeddings using pre-trained CodeBERT
- Calculate cosine similarity between embeddings
- Generate semantic similarity score (0-1)
-
Token Analysis
- Extract identifiers, keywords, and tokens
- Calculate Jaccard similarity
- Generate structural similarity score (0-1)
-
Score Combination
- Default: Weighted average (CodeBERT: 60%, Token: 40%)
- Alternative methods: Average, Max, Min, Harmonic mean
-
Grading
- A: β₯90% similarity
- B: 80-89%
- C: 70-79%
- D: 60-69%
- F: <60%
The system measures its own accuracy using:
- Mean Absolute Error (MAE): Average prediction error
- Root Mean Squared Error (RMSE): Error with penalty for large deviations
- RΒ² Score: Correlation between predictions and ground truth
- Classification Accuracy: Pass/fail decision accuracy
CodeInspector/
β
βββ main.py # Main orchestrator
βββ app.py # Flask web application
βββ requirements.txt # Python dependencies
βββ README.md # This file
β
βββ Core Modules/
β βββ github_manager.py # GitHub integration
β βββ code_preprocessor.py # Code preprocessing
β βββ codebert_evaluator.py # CodeBERT evaluation
β βββ token_similarity_evaluator.py # Token-based evaluation
β βββ score_combiner.py # Score combination
β βββ accuracy_calculator.py # Accuracy metrics
β βββ report_generator.py # Report generation
β
βββ templates/ # HTML templates
β βββ index.html # Main page
β βββ report.html # Report page
β
βββ samples/ # Sample data
β βββ reference_code.py
β βββ student_code_high_similarity.py
β βββ student_code_medium_similarity.py
β βββ student_code_low_similarity.py
β βββ requirements.txt
β
βββ reports/ # Generated reports (created at runtime)
βββ uploads/ # Uploaded files (created at runtime)
βββ data/ # Dataset storage (optional)
Edit the initialization in main.py or app.py:
inspector = CodeInspector(
codebert_weight=0.6, # CodeBERT importance (0-1)
token_weight=0.4, # Token similarity importance (0-1)
pass_threshold=0.7 # Minimum score to pass (0-1)
)Available methods in score_combiner.py:
weighted: Custom weights (default)average: Simple averagemax: Take maximum scoremin: Take minimum score (conservative)harmonic: Harmonic mean (penalizes low scores)
Currently supported:
- Python (.py)
- Java (.java)
- JavaScript (.js)
- C++ (.cpp)
- C (.c)
To add more languages, extend the preprocessor and tokenizer.
class CodeInspector:
def __init__(self, github_token=None, codebert_weight=0.6,
token_weight=0.4, pass_threshold=0.7)
def evaluate_code(self, student_code, reference_code,
requirements=None, language='python',
combination_method='weighted') -> Dict
def evaluate_github_project(self, student_url, reference_code,
requirements=None, language='python') -> Dict
def batch_evaluate(self, student_codes, reference_code,
requirements=None, language='python') -> List[Dict]
def generate_report(self, evaluation_results, student_info=None,
requirements=None, output_format='all') -> Dict[str, str]GET /: Home pagePOST /evaluate: Evaluate code submissionGET /report/<eval_id>: View evaluation reportGET /download/<eval_id>/<format>: Download report (html/json/text)POST /batch-evaluate: Batch evaluationGET /api/health: Health check
# Test high similarity
python main.py --student-code samples/student_code_high_similarity.py --reference-code samples/reference_code.py
# Test medium similarity
python main.py --student-code samples/student_code_medium_similarity.py --reference-code samples/reference_code.py
# Test low similarity
python main.py --student-code samples/student_code_low_similarity.py --reference-code samples/reference_code.py- High Similarity: 85-95% combined score, Grade A/B
- Medium Similarity: 60-75% combined score, Grade B/C
- Low Similarity: 40-55% combined score, Grade C/D
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see LICENSE file for details.
If you use this project in your research, please cite:
@software{codeinspector2024,
title={Code Inspector: Automated Code Evaluation Using CodeBERT and Token Similarity},
author={Your Name},
year={2024},
url={https://github.com/yourusername/CodeInspector}
}- Microsoft Research for the CodeBERT model
- HuggingFace for the Transformers library
- Flask framework for web interface
For questions, issues, or feature requests:
- Open an issue on GitHub
- Email: your.email@example.com
Future enhancements:
- Support for more programming languages
- Custom model fine-tuning
- Plagiarism detection
- Code quality metrics
- Integration with LMS platforms
- Real-time collaboration features
- Advanced visualization dashboards
Built with β€οΈ for Computer Science Education