A Method for Identifying Farmland System Habitat Types Based on the Dynamic-Weighted Feature Fusion Network Model
This repository contains the official implementation of our paper "A Method for Identifying Farmland System Habitat Types Based on the Dynamic-Weighted Feature Fusion Network Model" published in [TODO].
- Introduction
- Architecture
- Installation
- Dataset Preparation
- Training
- Evaluation
- Results
- Comparison Experiments
- Citation
- Acknowledgements
In this paper, we propose DWFF-Net (Dynamic-Weighted Feature Fusion Net), a novel approach for remote sensing image semantic segmentation. Our method leverages the powerful pre-trained DINOv3 vision transformer as the backbone and introduces a Dynamic-Weighted feature fusion module to effectively capture both high level semantic information and low level details.
The study addresses the current lack of a standardized habitat classification system for cultivated land ecosystems, incomplete coverage of habitat types, and the inability of existing models to effectively integrate semantic and texture features resulting in insufficient segmentation accuracy and blurred boundaries for multi-scale habitats.
Key contributions:
- A comprehensively annotated ultra-high-resolution remote sensing image dataset encompassing 15 categories of cultivated land system habitats
- Dynamic-Weighted Feature Fusion Network (DWFF-Net) that utilizes a frozen-parameter DINOv3 to extract foundational features with a data-level adaptive dynamic weighting strategy for feature fusion
- Superior performance compared to state-of-the-art methods with mIoU of 0.6979 and F1-score of 0.8049
Our proposed DWFF-Net consists of three main components:
- Backbone: DINOv3 vision transformer (pre-trained) - A frozen DINOv3-ViT-L/16 model is used as a multi-level feature extractor
- Feature Extractor: Multi-level feature extraction from different transformer layers (specifically layers 1, 8, 16, and 24 as shown in the configuration)
- Decoder: Dynamic-weighted feature fusion network for semantic segmentation - A novel decoder that features a Dynamic Weight Feature Fusion (DWFF) mechanism to produce the final segmentation map
Dynamic-Weighted Feature Fusion Net

- Clone this repository:
git clone https://github.com/sysau/DWFF-Net.git
cd DWFF-Net- Install required packages:
pip install -r requirements.txtNote: This project requires Python 3.8+, PyTorch 2.0+, and the DINOv3 model from Hugging Face Transformers. Training was conducted with a batch size of 4 (with gradient accumulation every 8 rounds, resulting in an effective batch size of 32) distributed over double RTX 2080Ti 11G GPUs, using mixed-precision training (FP16 mixed-precision).
- Download DINOv3 model weights: Due to licensing constraints, please download the DINOv3 model weights from the official repository.
The dataset should be organized in the following structure:
data/
├── JPEGImages/
│ ├── sample1.jpg
│ ├── sample2.jpg
│ └── ...
├── SegmentationClass/
│ ├── sample1.png
│ ├── sample2.png
│ └── ...
├── train.txt
├── val.txt
└── test.txt
The dataset contains 16 classes for habitat segmentation:
- background
- arbor-shrub-grass compound land
- dry land
- grass belt
- scattered trees
- dirt road
- paved road
- forest belt
- woody area
- unused land
- paddy field
- ridge
- construction land
- river
- tidal flats
- water
This dataset was constructed specifically for cultivated land system habitats with ultra-high-resolution remote sensing imagery acquired by the FeimaRobotics V500 UAV system with a spatial resolution of 0.1 m. The study area is located in the Hailun River Basin of Hailun City, Heilongjiang Province, China. The dataset contains 800 total records divided into train, val, and test sets in a 6:1:1 ratio.
cd DWFF-Net
./train.shConfiguration file: config.yaml
- Uses dynamic weighted feature fusion with layers [1, 8, 16, -1] (corresponding to layers 1, 8, 16, and 24)
cd NWFF-Net
# For different configurations:
./train_1.sh # Single level feature (layer -1)
./train_2.sh # Two-level feature fusion (layers 1, -1)
./train_3.sh # Three-level feature fusion (layers 1, 16, -1)
./train_4.sh # Four-level feature fusion (layers 1, 8, 16, -1)Configuration files:
- config-1.yaml - Single layer feature (layer -1)
- config-2.yaml - Two-level feature fusion (layers 1, -1)
- config-3.yaml - Three-level feature fusion (layers 1, 16, -1)
- config-4.yaml - Four-level feature fusion (layers 1, 8, 16, -1)
cd SWFF-Net
./train.shConfiguration file: config.yaml
- Uses static weighted feature fusion with layers [1, 8, 16, -1] (corresponding to layers 1, 8, 16, and 24)
To evaluate a trained model:
python main.py --config config.yaml --mode test --checkpoint path/to/checkpoint.ckptTest results will be saved in the test_predictions directory as int8 type npy files. The evaluation metrics include Precision, Recall, F1-score, and Intersection over Union (IoU) for each of the 16 classes.
Our method achieves state-of-the-art performance on the habitat segmentation dataset with an mIoU of 0.6979 and F1-score of 0.8049, outperforming the baseline network.
The model was trained for 150 epochs using the AdamW optimizer with a cosine-annealed learning rate.
More detailed results can be found in our paper.
We also compare our method with popular segmentation architectures:
cd compare-exp
./train_unet.shConfiguration: config_unet.yaml
cd compare-exp
./train_deeplab.shConfiguration: config_deeplabv3p.yaml
If you find this work useful in your research, please cite our paper:
@article{zheng2025dwffnet,
title={A Method for Identifying Farmland System Habitat Types Based on the Dynamic-Weighted Feature Fusion Network Model},
author={Kesong Zheng and Zhi Song and Peizhou Li and Shuyi Yao and Zhenxing Bian},
year={2025},
eprint={2511.11659},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.11659},
}This work was supported by [Scientific Research Fund of Liaoning Provincial Education Department of China under [Grant number LJ212510157036].]. We thank the authors of DINOv3 for releasing their pre-trained models. We also thank the contributors of the open-source libraries used in this project.

