Skip to content

Multipart body should not be map[string][]byte #822

@abiriadev

Description

@abiriadev

First of all, thanks for the wonderful project! colly has saved our team a lot of time!!

Context

ref: #8, #33

According to RFC7578 section 4.3:

4.3. Multiple Files for One Form Field

The form data for a form field might include multiple files.
[RFC2388] suggested that multiple files for a single form field be transmitted using a nested "multipart/mixed" part. This usage is deprecated.

To match widely deployed implementations, multiple files MUST be sent by supplying each file in a separate part but all with the same "name" parameter.

Receiving applications intended for wide applicability (e.g., multipart/form-data parsing libraries) SHOULD also support the older method of supplying multiple files.

and this practice is unsurprisingly common, and I am facing the exact same case.

The issue

The name field does not have to be unique. There are few common cases when a duplicated name field is required (e.g., when uploading an array of files), and this case should be properly covered.

colly/colly.go

Lines 551 to 559 in 99b7fb1

// PostMultipart starts a collector job by creating a Multipart POST request
// with raw binary data. PostMultipart also calls the previously provided callbacks
func (c *Collector) PostMultipart(URL string, requestData map[string][]byte) error {
boundary := randomBoundary()
hdr := http.Header{}
hdr.Set("Content-Type", "multipart/form-data; boundary="+boundary)
hdr.Set("User-Agent", c.UserAgent)
return c.scrape(URL, "POST", 1, createMultipartReader(boundary, requestData), nil, hdr, true)
}

colly/colly.go

Lines 1461 to 1469 in 99b7fb1

buffer.WriteString("Content-type: multipart/form-data; boundary=" + boundary + "\n\n")
for contentType, content := range data {
buffer.WriteString(dashBoundary + "\n")
buffer.WriteString("Content-Disposition: form-data; name=" + contentType + "\n")
buffer.WriteString(fmt.Sprintf("Content-Length: %d \n\n", len(content)))
buffer.Write(content)
buffer.WriteString("\n")
}
buffer.WriteString(dashBoundary + "--\n\n")

Unfortunately, the current implementaion accepts map[string][]byte, which enforces name to be unique.

Suggestion

Maybe we can accept []Subpart so that:

  1. The order of subparts is guaranteed
  2. filename and other metadata can be optionally included
  3. Duplicate name fields are allowed

and so on.

I would love to hear your opinion! If you think this is feasible, I will start working on it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions