This repository contains an AWS SAM template and a Java AWS Lambda function designed to back up Git repositories to AWS S3. It provides infrastructure-as-code for a Lambda named LambdaInfraBackup, plus a Java handler in LambdaRepositoryBackup.
Current Status: The project is complete and functional. The Lambda function can clone Git repositories, package them into compressed archives, and upload them to AWS S3 on a scheduled basis.
This repository was created to provide automated backups of Git repositories to AWS S3. The intended use case is to periodically clone Git repositories and store them securely in S3 for disaster recovery, archival purposes, or compliance requirements.
Workflow:
- Lambda function is triggered on a schedule (daily at 2 AM UTC by default via CloudWatch Events)
- Function clones the specified Git repository using JGit
- Repository is packaged as a compressed tar.gz archive
- Package is uploaded to the designated S3 bucket with a timestamp
- Temporary files are cleaned up automatically
Prerequisites:
- Java 11 (from
LambdaRepositoryBackup/build.gradleandtemplate.yamlruntime). - Gradle wrapper (scripts in
LambdaRepositoryBackup/gradlewandLambdaRepositoryBackup/gradlew.bat). - AWS CLI and SAM CLI for deployment
Run tests locally:
cd LambdaRepositoryBackup
./gradlew testBuild locally:
cd LambdaRepositoryBackup
./gradlew buildDeployment:
# Build the SAM application
sam build
# Deploy with guided configuration
sam deploy --guided
# Or deploy with parameters
sam deploy \
--parameter-overrides \
GitRepoUrl=https://github.com/yourusername/yourrepo.git \
S3BucketName=your-backup-bucket \
S3Prefix=backupsTroubleshooting:
- Ensure the S3 bucket exists or is created before deployment
- The Lambda function requires permissions to access the Git repository (use SSH keys or tokens for private repos)
- Check CloudWatch Logs for detailed execution logs
To verify the implementation works correctly, run:
cd LambdaRepositoryBackup
./gradlew testExpected output: All tests should pass, including:
AppTest.testMissingGitRepoUrl- validates environment variable validationAppTest.testHandlerWithMockServices- validates handler works with mock servicesGitServiceTest- validates Git repository cloning logicArchiveServiceTest- validates archive creationS3ServiceTest- validates S3 upload validation logic- Build should complete successfully with message:
BUILD SUCCESSFUL
This validates:
- The Lambda handler correctly processes ScheduledEvent inputs
- Environment variable validation works correctly
- Git cloning, archiving, and S3 upload services have proper error handling
- The Java compilation and test infrastructure work correctly
Build verification:
cd LambdaRepositoryBackup
./gradlew buildExpected: Build completes successfully, producing the compiled Lambda function code.
flowchart TD
A[CloudWatch Events<br/>Scheduled Trigger] --> B[LambdaInfraBackup<br/>AWS Lambda Function]
B --> C[GitService<br/>Clone Repository with JGit]
C --> D[ArchiveService<br/>Package as tar.gz]
D --> E[S3Service<br/>Upload to S3]
E --> F[S3 Bucket<br/>Backup Storage]
G[template.yaml<br/>AWS SAM Template] -.defines.-> B
Implementation: The SAM template defines a Lambda function using the Java handler LambdaRepositoryBackup.App::handleRequest. The handler orchestrates the backup workflow using three services: GitService for cloning repositories, ArchiveService for creating compressed archives, and S3Service for uploading to S3.
template.yaml: AWS SAM template defining the Lambda function, runtime, memory, timeout, environment variables, and S3 permissions.LambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/App.java: Lambda handler for scheduled events that orchestrates the backup workflow.LambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/GitService.java: Service for cloning Git repositories using JGit.LambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/ArchiveService.java: Service for creating tar.gz archives using Apache Commons Compress.LambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/S3Service.java: Service for uploading files to S3 using AWS SDK v2.LambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/Util.java: Logging helper that prints environment, context, and event.LambdaRepositoryBackup/src/test/java/LambdaRepositoryBackup/AppTest.java: JUnit tests for the handler.LambdaRepositoryBackup/src/test/java/LambdaRepositoryBackup/GitServiceTest.java: JUnit tests for GitService.LambdaRepositoryBackup/src/test/java/LambdaRepositoryBackup/ArchiveServiceTest.java: JUnit tests for ArchiveService.LambdaRepositoryBackup/src/test/java/LambdaRepositoryBackup/S3ServiceTest.java: JUnit tests for S3Service.LambdaRepositoryBackup/src/test/java/LambdaRepositoryBackup/TestContext.java: Test Context implementation.LambdaRepositoryBackup/src/test/java/LambdaRepositoryBackup/TestLogger.java: Test logger implementation.LambdaRepositoryBackup/events/ScheduleEvent.json: Sample scheduled event payload for tests.LambdaRepositoryBackup/build.gradle: Java dependencies and build configuration.
- Lambda handler:
LambdaRepositoryBackup.App::handleRequest(declared intemplate.yaml). - Input event type:
ScheduledEvent(fromLambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/App.java). - Example event payload used in tests:
LambdaRepositoryBackup/events/ScheduleEvent.json.
- SAM function config in
template.yaml:- Runtime:
java11 - Memory:
512MB (increased from 128 to handle repository cloning and archiving) - Timeout:
300seconds (5 minutes, increased from 5 seconds) - Environment variables:
GIT_REPO_URL: URL of the Git repository to backup (required)S3_BUCKET: Name of the S3 bucket for backups (required)S3_PREFIX: Prefix for S3 object keys (default: "backups")JAVA_TOOL_OPTIONS:-XX:+TieredCompilation -XX:TieredStopAtLevel=1
- Policies:
S3CrudPolicyfor S3 bucket access - Events: Scheduled daily at 2 AM UTC (
cron(0 2 * * ? *))
- Runtime:
- Parameters (configurable at deployment):
GitRepoUrl: Default "https://github.com/example/repo.git"S3BucketName: Default "git-backup-bucket"S3Prefix: Default "backups"
- No
.envfiles or secrets present.
- AWS Lambda (runtime and handler):
template.yaml. - Java dependencies (from
LambdaRepositoryBackup/build.gradle):com.amazonaws:aws-lambda-java-core:1.2.1- AWS Lambda core librarycom.amazonaws:aws-lambda-java-events:3.11.0- AWS Lambda event typescom.amazonaws:aws-lambda-java-tests:1.1.1- AWS Lambda testing utilitiescom.google.code.gson:gson:2.9.0- JSON serializationorg.slf4j:slf4j-api:2.0.1- Logging APIsoftware.amazon.awssdk:s3:2.20.26- AWS SDK v2 for S3 operationsorg.eclipse.jgit:org.eclipse.jgit:6.7.0.202309050840-r- Git repository operationsorg.apache.commons:commons-compress:1.24.0- Archive creation (tar.gz)junit:junit:4.13.2(tests)
- Tests: JUnit tests in
LambdaRepositoryBackup/src/test/java/LambdaRepositoryBackup/:AppTest.java- Tests for the main handlerGitServiceTest.java- Tests for Git operationsArchiveServiceTest.java- Tests for archive creationS3ServiceTest.java- Tests for S3 upload operations
- CI: GitHub Actions workflow in
.github/workflows/sam-pipeline.yml. - Linting/formatting: None configured.
- Static analysis/dependency scanning: GitHub Dependabot configured in
.github/dependabot.yml. - Build commands:
- Test:
./gradlew test - Build:
./gradlew build
- Test:
Status: Clean Reviewed areas:
README.mdtemplate.yamlLambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/App.javaLambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/GitService.javaLambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/ArchiveService.javaLambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/S3Service.javaLambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/Util.java- All test files
LambdaRepositoryBackup/events/ScheduleEvent.jsonLambdaRepositoryBackup/build.gradleFindings:LambdaRepositoryBackup/src/main/java/LambdaRepositoryBackup/Util.javalogs all environment variables and context, which could expose secrets if they are injected at runtime. Actions taken:- None. Users should be aware that environment variables are logged for debugging purposes. Notes:
- For private repositories, users should configure Git credentials using AWS Secrets Manager or Parameter Store and access them within the Lambda function, not via environment variables.
The implementation provides a complete solution for automated Git repository backups to S3:
-
Scheduled Execution: The Lambda function is triggered daily at 2 AM UTC via CloudWatch Events (configurable via the SAM template).
-
Git Cloning: Uses Eclipse JGit library to clone repositories into a temporary directory in
/tmp. Supports public HTTPS repositories out of the box. -
Archive Creation: Creates tar.gz archives using Apache Commons Compress, excluding the
.gitdirectory to reduce archive size while preserving all repository files. -
S3 Upload: Uses AWS SDK v2 to upload archives to S3 with timestamped filenames (format:
repo-backup-YYYYMMDD-HHmmss.tar.gz). -
Resource Management: Automatically cleans up temporary files and directories after backup completion or on error.
-
Error Handling: Validates environment variables, handles exceptions, and returns descriptive error messages.
-
Testing: Comprehensive unit tests cover input validation, error cases, and successful execution paths using mock services.
Security & Credentials:
- P1 / S: Support for private repositories with authentication (SSH keys, personal access tokens via Secrets Manager)
- P1 / S: Filter sensitive environment variables in
Util.logEnvironmentto prevent credential exposure - P2 / S: Encryption at rest for S3 backups (KMS integration)
Operational Features:
- P2 / M: Implement backup retention and cleanup policies (delete old backups)
- P2 / S: Add support for multiple repository backups in a single execution
- P2 / S: Add support for incremental backups
- P2 / M: Add CloudWatch alarms for backup failures
- P2 / S: Add metrics for backup size and duration
Documentation:
- P2 / S: Document IAM permissions required for accessing private repositories
- P2 / S: Document how to configure Git credentials securely
Testing:
- P2 / S: Add integration tests with real Git repositories and S3 buckets
- P2 / S: Add performance tests for large repositories
Developer experience:
- P2 / S: Add local testing instructions with SAM CLI
- P2 / S: Add contribution guidelines
Use Cases: This repository provides an automated solution for backing up Git repositories to AWS S3, useful for:
- Disaster recovery and business continuity
- Compliance and audit requirements
- Creating snapshots of repository state at regular intervals
- Archiving repositories before major changes or migrations
- Backing up public repositories for offline access
- Creating point-in-time backups for rollback purposes
Current State: The project is complete and functional. It provides:
- Scheduled Lambda function that runs daily
- Automated Git repository cloning using JGit
- Compression and archiving with tar.gz format
- Upload to S3 with timestamped filenames
- Comprehensive error handling and logging
- Full test coverage of core functionality
Project type: AWS SAM template + Java Lambda (Gradle)
Primary domain: Git repository backup to AWS S3 (infrastructure-as-code for scheduled Lambda execution)
Functionality: Complete - Scheduled backup of Git repositories to AWS S3
Current status: Fully implemented and tested
Core entities: Lambda function (LambdaInfraBackup), handler (App), event (ScheduledEvent), services (GitService, ArchiveService, S3Service)
Extension points: Add new Lambda functions in template.yaml, add handlers in LambdaRepositoryBackup/src/main/java, customize backup schedule in template
Areas safe to modify: Service implementations for custom Git authentication, archive formats, or S3 upload logic; schedule configuration; environment variables
Areas requiring caution and why:
Util.logEnvironmentbecause it logs all env/context and may expose secretstemplate.yamlIAM policies because they control S3 access permissions- Temporary file cleanup logic in
App.javabecause failures could leave large files in/tmp - Archive exclusion logic in
ArchiveService.javabecause including.gitsignificantly increases backup size Canonical commands: - Build:
cd LambdaRepositoryBackup && ./gradlew build - Test:
cd LambdaRepositoryBackup && ./gradlew test - Deploy:
sam build && sam deploy --guided - Local test:
sam local invoke LambdaInfraBackup -e LambdaRepositoryBackup/events/ScheduleEvent.json