File monitoring service that copies new files from a source directory to a destination. Built for document management systems like Paperless-NGX.
I created this out of a need to use paperless-ngx without having it devour my documentation. I needed a way to continue using my existing organization structure while also using paperless. It does create two sources for the same information, yes, but that's what I needed. And since there was nothing else like it out there, I chose to build this small, efficient, and simple tool.
Real-time monitoring, duplicate detection, file stability checks, persistent state, file filtering.
Complete docker-compose.yml example:
version: "3.8"
services:
dropbox_consumer:
image: trusmith/dropbox-consumer:latest
container_name: dropbox_consumer
user: "${PUID:-1000}:${PGID:-1000}"
volumes:
- /your/source/path:/source:ro
- /your/destination/path:/consume:rw
- ./state:/app/state:rw
environment:
- SOURCE=/source
- DEST=/consume
- RECURSIVE=true
- PRESERVE_DIRS=false
- COPY_EMPTY_DIRS=false
- STATE_DIR=/app/state
- STATE_CLEANUP_DAYS=30
- STATE_BACKUP_COUNT=3
- LOG_LEVEL=INFO
- DEBOUNCE_SECONDS=1.0
- STABILITY_INTERVAL=0.5
- STABILITY_STABLE_ROUNDS=2
- COPY_TIMEOUT=60
- MAX_WORKERS=4
- FILE_INCLUDE_PATTERNS=
- FILE_EXCLUDE_PATTERNS=
- DRY_RUN=false
- DELETE_SOURCE=false
- MAX_FILE_SIZE_MB=0
- COMPRESS_FILES=false
- WEBHOOK_URL=
- RETRY_ATTEMPTS=3
- RETRY_DELAY=2.0
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
restart: unless-stoppedStart the service:
docker-compose up -dMonitor activity:
docker-compose logs -f dropbox_consumerEnvironment variables control behavior. Set in docker-compose.yml or via command line.
| Variable | Default | Description |
|---|---|---|
SOURCE |
/source |
Directory to monitor |
DEST |
/consume |
Target directory for copied files |
RECURSIVE |
true |
Monitor subdirectories |
PRESERVE_DIRS |
false |
Maintain source directory structure |
STATE_DIR |
/app/state |
State persistence location |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
| Variable | Default | Description |
|---|---|---|
DEBOUNCE_SECONDS |
1.0 |
Wait time for event debouncing |
STABILITY_INTERVAL |
0.5 |
Interval between file stability checks |
STABILITY_STABLE_ROUNDS |
2 |
Consecutive stable checks required |
COPY_TIMEOUT |
60 |
Maximum seconds to wait for file stability |
MAX_WORKERS |
4 |
Concurrent processing threads |
| Variable | Default | Description |
|---|---|---|
FILE_INCLUDE_PATTERNS |
(empty) | Comma-separated patterns to include (e.g., *.pdf,*.jpg) |
FILE_EXCLUDE_PATTERNS |
(empty) | Comma-separated patterns to exclude (e.g., *.tmp,*.swp) |
When include patterns are specified, only files matching at least one pattern are processed. Exclude patterns are applied first and take precedence.
Examples:
# Only copy PDF and image files
environment:
- FILE_INCLUDE_PATTERNS=*.pdf,*.jpg,*.png,*.jpeg
# Exclude temporary and system files
environment:
- FILE_EXCLUDE_PATTERNS=*.tmp,*.swp,.DS_Store,Thumbs.db
# Combine both
environment:
- FILE_INCLUDE_PATTERNS=*.pdf,*.docx
- FILE_EXCLUDE_PATTERNS=*draft*,*temp*| Variable | Default | Description |
|---|---|---|
COPY_EMPTY_DIRS |
false |
Copy empty directories (requires PRESERVE_DIRS=true) |
DRY_RUN |
false |
Simulate operations without copying files |
DELETE_SOURCE |
false |
Delete source files after successful copy |
MAX_FILE_SIZE_MB |
0 |
Maximum file size in MB (0 = unlimited) |
COMPRESS_FILES |
false |
Compress files with gzip during copy |
WEBHOOK_URL |
(empty) | HTTP endpoint for copy notifications |
RETRY_ATTEMPTS |
3 |
Number of retry attempts for failed operations |
RETRY_DELAY |
2.0 |
Initial retry delay in seconds (exponential backoff) |
STATE_CLEANUP_DAYS |
30 |
Days to retain old state entries |
STATE_BACKUP_COUNT |
3 |
Number of state file backups to keep |
PUID |
1000 |
User ID for file operations |
PGID |
1000 |
Group ID for file operations |
On startup, the service snapshots all existing files in the source directory and marks them as processed. Only new files added after startup are copied.
File stability checks prevent copying incomplete files. The service monitors file size and waits for it to remain unchanged for the configured stability period.
Duplicate detection uses SHA-256 hashes. Files with identical content are skipped regardless of filename.
State is persisted to disk and survives container restarts. Files processed before a restart will not be reprocessed.
When PRESERVE_DIRS=true, the service maintains the relative path structure:
source/2024/invoices/document.pdf → destination/2024/invoices/document.pdf
When disabled, all files are copied to the destination root:
source/2024/invoices/document.pdf → destination/document.pdf
The service runs as the user specified by PUID and PGID. Set these to match your destination directory ownership:
id $(whoami)Update docker-compose.yml:
user: "${PUID:-1000}:${PGID:-1000}"The state directory must be writable by this user.
Logs are written to stdout and managed by Docker. Configure rotation in docker-compose.yml:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"Run locally without Docker:
python3 -m pip install -r requirements.txt
export SOURCE=/path/to/source
export DEST=/path/to/destination
python3 main.pyCheck logs for errors:
docker-compose logs -f dropbox_consumerVerify permissions:
docker-compose exec dropbox_consumer ls -la /source
docker-compose exec dropbox_consumer ls -la /consumeEnsure PUID and PGID match the destination directory owner:
ls -la /your/destination/path
id $(whoami)Set correct values in docker-compose.yml.
Verify state directory is writable:
ls -la ./state/
docker-compose exec dropbox_consumer ls -la /app/stateFix permissions if needed:
sudo chown -R $(id -u):$(id -g) ./state/Reduce worker threads or increase debounce time:
environment:
- MAX_WORKERS=1
- DEBOUNCE_SECONDS=5.0Image: trusmith/dropbox-consumer:latest
Pull the image:
docker pull trusmith/dropbox-consumer:latestAvailable tags:
latest- Most recent stable releasev3.0- Version 3.0 releasemain- Latest development build
Licensed under the MIT License.
See the LICENSE file for details.
If this project helped you, please consider supporting my work:
