Skip to content

fix: RFC 2047 encode non-ASCII display names in address headers#405

Open
HaseU-git wants to merge 1 commit intogoogleworkspace:mainfrom
HaseU-git:fix/rfc2047-encode-address-headers
Open

fix: RFC 2047 encode non-ASCII display names in address headers#405
HaseU-git wants to merge 1 commit intogoogleworkspace:mainfrom
HaseU-git:fix/rfc2047-encode-address-headers

Conversation

@HaseU-git
Copy link

Description

Non-ASCII display names in To, From, Cc, and Bcc headers were not RFC 2047 encoded, causing mojibake when the draft was viewed in Gmail (e.g. Japanese 下野祐太 appeared as ä¸‹é‡Žç¥ å¤ª).

The root cause was that MessageBuilder::build() applied encode_header_value() (RFC 2047) only to the Subject header, while address headers used only sanitize_header_value() which strips CRLF but does not encode non-ASCII characters.

Changes

  • Added encode_address_header_value() — parses address formats ("Name" <addr>, Name <addr>, bare addr, comma-separated lists), encodes only the display-name portion via RFC 2047, and preserves the angle-bracket email address as-is.
  • Applied encode_address_header_value() to To, From, Cc, and Bcc in MessageBuilder::build().
  • Added 5 unit tests for encode_address_header_value() (bare email, ASCII display name, non-ASCII display name, multiple addresses, unquoted non-ASCII).
  • Added 1 integration test test_message_builder_non_ascii_address_headers verifying all four address headers are properly encoded.

Closes #404

Checklist:

  • My code follows the AGENTS.md guidelines (no generated google-* crates).
  • I have run cargo fmt --all to format the code perfectly.
  • I have run cargo clippy -- -D warnings and resolved all warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have provided a Changeset file (e.g. via pnpx changeset) to document my changes.

…leworkspace#404)

Apply encode_address_header_value() to To, From, Cc, and Bcc headers
in MessageBuilder::build() so that non-ASCII display names (e.g.
Japanese characters) are properly RFC 2047 encoded. Previously only
the Subject header was encoded, causing mojibake in address headers.

Closes googleworkspace#404

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@changeset-bot
Copy link

changeset-bot bot commented Mar 11, 2026

🦋 Changeset detected

Latest commit: bd170fa

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@googleworkspace/cli Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@googleworkspace-bot googleworkspace-bot added the area: core Core CLI parsing, commands, error handling, utilities label Mar 11, 2026
@google-cla
Copy link

google-cla bot commented Mar 11, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where non-ASCII display names in email address headers were not being properly encoded according to RFC 2047, resulting in garbled text (mojibake) when emails were viewed in certain clients. The changes introduce a dedicated function to handle the encoding of display names within address headers and apply it consistently across all relevant email fields, ensuring international characters are correctly preserved and displayed.

Highlights

  • RFC 2047 Encoding for Address Headers: Implemented a new helper function, encode_address_header_value, to correctly apply RFC 2047 encoding to non-ASCII display names within email address headers (e.g., 'To', 'From', 'Cc', 'Bcc').
  • Integration with MessageBuilder: Integrated the new encoding function into MessageBuilder::build() to ensure that 'To', 'From', 'Cc', and 'Bcc' headers are properly encoded, resolving issues with mojibake in email clients.
  • Comprehensive Testing: Added five unit tests for encode_address_header_value to cover various address formats and non-ASCII character scenarios, and an integration test to validate the end-to-end encoding behavior of MessageBuilder.
Changelog
  • .changeset/fix-rfc2047-address-headers.md
    • Added a new changeset file documenting the fix for RFC 2047 encoding in address headers.
  • src/helpers/gmail/mod.rs
    • Added encode_address_header_value function to handle RFC 2047 encoding for address display names.
    • Updated MessageBuilder to apply encode_address_header_value to 'To', 'From', 'Cc', and 'Bcc' headers.
    • Expanded MessageBuilder documentation to mention encoding of address display names.
    • Added five unit tests for encode_address_header_value covering various scenarios.
    • Added an integration test test_message_builder_non_ascii_address_headers to verify correct encoding in MessageBuilder.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Generative AI Prohibited Use Policy, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to correctly implement RFC 2047 encoding for email address headers by introducing the encode_address_header_value function, applying it to To, From, Cc, and Bcc headers. While a security audit confirmed no high or critical severity vulnerabilities, a critical issue was identified in the parsing logic of encode_address_header_value. The current implementation uses a simple string split on commas, which can lead to malformed headers for email addresses with commas in quoted display names. This requires a fix by reusing existing robust parsing logic and adding a unit test.

Comment on lines +280 to +302
pub(super) fn encode_address_header_value(value: &str) -> String {
value
.split(',')
.map(|addr| {
let trimmed = addr.trim();
if let Some(open) = trimmed.rfind('<') {
let display = trimmed[..open].trim();
let angle_addr = &trimmed[open..]; // "<email>"
if display.is_empty() {
trimmed.to_string()
} else {
// Strip surrounding quotes if present.
let unquoted = display.trim_matches('"').trim();
format!("{} {}", encode_header_value(unquoted), angle_addr)
}
} else {
// Bare address (no display name) — nothing to encode.
trimmed.to_string()
}
})
.collect::<Vec<_>>()
.join(", ")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This function incorrectly uses .split(',') to parse a list of email addresses. This will fail for valid addresses where the display name contains a comma inside quotes, such as "Doe, John" <john@example.com>. This is a critical bug that can lead to malformed headers and email delivery issues.

The codebase already has a correct implementation for this in src/helpers/gmail/reply.rs: the split_mailbox_list function. It correctly handles quoted strings. I strongly recommend refactoring to use that logic here. This would likely involve moving split_mailbox_list to this shared mod.rs file to make it accessible.

Here's how the corrected function could look after making split_mailbox_list available:

pub(super) fn encode_address_header_value(value: &str) -> String {
    split_mailbox_list(value) // Use the robust splitter
        .iter()
        .map(|addr| {
            let trimmed = addr.trim(); // addr is already a single, complete address
            if let Some(open) = trimmed.rfind('<') {
                let display = trimmed[..open].trim();
                let angle_addr = &trimmed[open..]; // "<email>"
                if display.is_empty() {
                    trimmed.to_string()
                } else {
                    // Strip surrounding quotes if present.
                    let unquoted = display.trim_matches('"').trim();
                    format!("{} {}", encode_header_value(unquoted), angle_addr)
                }
            } else {
                // Bare address (no display name) — nothing to encode.
                trimmed.to_string()
            }
        })
        .collect::<Vec<_>>()
        .join(", ")
}

Additionally, please add a new unit test to cover this comma-in-display-name scenario to prevent regressions.

#[test]
fn test_encode_address_header_value_quoted_comma_in_name() {
    let input = r#""Doe, John" <john.doe@example.com>, "Smith, Jane" <jane.smith@example.com>"#;
    let result = encode_address_header_value(input);
    // The exact assertion depends on how encode_header_value handles ASCII,
    // but the key is that it shouldn't split the names.
    assert!(result.contains("Doe, John <john.doe@example.com>"));
    assert!(result.contains("Smith, Jane <jane.smith@example.com>"));
    assert_eq!(result.matches(", ").count(), 1, "Should only be one separator between addresses");
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: core Core CLI parsing, commands, error handling, utilities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gmail draft update: non-ASCII CC/From/BCC headers not RFC 2047 encoded (mojibake)

2 participants