File API

W3C Working Draft,

More details about this document
This version:
https://www.downtownmelody.com/_x/d3d3LnczLm9yZw/TR/2024/WD-FileAPI-20241204/
Latest published version:
https://www.downtownmelody.com/_x/d3d3LnczLm9yZw/TR/FileAPI/
Editor's Draft:
https://www.downtownmelody.com/_x/dzNjLmdpdGh1Yi5pbw/FileAPI/
Previous Versions:
History:
https://www.downtownmelody.com/_x/d3d3LnczLm9yZw/standards/history/FileAPI/
Feedback:
GitHub
Inline In Spec
Editor:
(Google)
Former Editor:
Arun Ranganathan (Mozilla Corporation)
Tests:
web-platform-tests FileAPI/ (ongoing work)

Abstract

This specification provides an API for representing file objects in web applications, as well as programmatically selecting them and accessing their data. This includes:

Additionally, this specification defines objects to be used within threaded web applications for the synchronous reading of files.

§ 10 Requirements and Use Cases covers the motivation behind this specification.

This API is designed to be used in conjunction with other APIs and elements on the web platform, notably: XMLHttpRequest (e.g. with an overloaded send() method for File or Blob arguments), postMessage(), DataTransfer (part of the drag and drop API defined in [HTML]) and Web Workers. Additionally, it should be possible to programmatically obtain a list of files from the input element when it is in the File Upload state [HTML]. These kinds of behaviors are defined in the appropriate affiliated specifications.

Status of this document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.downtownmelody.com/_x/d3d3LnczLm9yZw/TR/.

This document was published by the Web Applications Working Group as a Working Draft. This document is intended to become a W3C Recommendation.

Previous discussion of this specification has taken place on two other mailing lists: public-webapps@w3.org (archive) and public-webapi@w3.org (archive). Ongoing discussion will be on the public-webapps@w3.org mailing list.

This draft consists of changes made to the previous Last Call Working Draft. Please send comments to the public-webapi@w3.org as described above. You can see Last Call Feedback on the W3C Wiki: https://www.downtownmelody.com/_x/d3d3LnczLm9yZw/wiki/Webapps/LCWD-FileAPI-20130912

An implementation report is automatically generated from the test suite.

This document was published by the Web Applications Working Group as a Working Draft using the Recommendation track. Feedback and comments on this specification are welcome. Please use GitHub issues Historical discussions can be found in the public-webapps@w3.org archives.

Publication as a Working Draft does not imply endorsement by W3C and its Members. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 03 November 2023 W3C Process Document.

1. Introduction

This section is informative.

Web applications should have the ability to manipulate as wide as possible a range of user input, including files that a user may wish to upload to a remote server or manipulate inside a rich web application. This specification defines the basic representations for files, lists of files, errors raised by access to files, and programmatic ways to read files. Additionally, this specification also defines an interface that represents "raw data" which can be asynchronously processed on the main thread of conforming user agents. The interfaces and API defined in this specification can be used with other interfaces and APIs exposed to the web platform.

The File interface represents file data typically obtained from the underlying file system, and the Blob interface ("Binary Large Object" - a name originally introduced to web APIs in Google Gears) represents immutable raw data. File or Blob reads should happen asynchronously on the main thread, with an optional synchronous API used within threaded web applications. An asynchronous API for reading files prevents blocking and UI "freezing" on a user agent’s main thread. This specification defines an asynchronous API based on an event model to read and access a File or Blob’s data. A FileReader object provides asynchronous read methods to access that file’s data through event handler content attributes and the firing of events. The use of events and event handlers allows separate code blocks the ability to monitor the progress of the read (which is particularly useful for remote drives or mounted drives, where file access performance may vary from local drives) and error conditions that may arise during reading of a file. An example will be illustrative.

In the example below, different code blocks handle progress, error, and success conditions.
function startRead() {
  // obtain input element through DOM

  var file = document.getElementById('file').files[0];
  if(file){
    getAsText(file);
  }
}

function getAsText(readFile) {

  var reader = new FileReader();

  // Read file into memory as UTF-16
  reader.readAsText(readFile, "UTF-16");

  // Handle progress, success, and errors
  reader.onprogress = updateProgress;
  reader.onload = loaded;
  reader.onerror = errorHandler;
}

function updateProgress(evt) {
  if (evt.lengthComputable) {
    // evt.loaded and evt.total are ProgressEvent properties
    var loaded = (evt.loaded / evt.total);
    if (loaded < 1) {
      // Increase the prog bar length
      // style.width = (loaded * 200) + "px";
    }
  }
}

function loaded(evt) {
  // Obtain the read file data
  var fileString = evt.target.result;
  // Handle UTF-16 file dump
  if(utils.regexp.isChinese(fileString)) {
    //Chinese Characters + Name validation
  }
  else {
    // run other charset test
  }
  // xhr.send(fileString)
}

function errorHandler(evt) {
  if(evt.target.error.name == "NotReadableError") {
    // The file could not be read
  }
}

2. Terminology and Algorithms

When this specification says to terminate an algorithm the user agent must terminate the algorithm after finishing the step it is on. Asynchronous read methods defined in this specification may return before the algorithm in question is terminated, and can be terminated by an abort() call.

The algorithms and steps in this specification use the following mathematical operations:

The term Unix Epoch is used in this specification to refer to the time 00:00:00 UTC on January 1 1970 (or 1970-01-01T00:00:00Z ISO 8601); this is the same time that is conceptually "0" in ECMA-262 [ECMA-262].

The slice blob algorithm given a Blob blob, start, end, and contentType is used to refer to the following steps and returns a new Blob containing the bytes ranging from the start parameter up to but not including the end parameter. It must act as follows:
  1. Let originalSize be blob’s size.

  2. The start parameter, if non-null, is a value for the start point of a slice blob call, and must be treated as a byte-order position, with the zeroth position representing the first byte. User agents must normalize start according to the following:

    1. If start is null, let relativeStart be 0.
    2. If start is negative, let relativeStart be max((originalSize + start), 0).
    3. Otherwise, let relativeStart be min(start, originalSize).
  3. The end parameter, if non-null. is a value for the end point of a slice blob call. User agents must normalize end according to the following:

    1. If end is null, let relativeEnd be originalSize.
    2. If end is negative, let relativeEnd be max((originalSize + end), 0).
    3. Otherwise, let relativeEnd be min(end, originalSize).
  4. The contentType parameter, if non-null, is used to set the ASCII-encoded string in lower case representing the media type of the Blob. User agents must normalize contentType according to the following:

    1. If contentType is null, let relativeContentType be set to the empty string.
    2. Otherwise, let relativeContentType be set to contentType and run the substeps below:
      1. If relativeContentType contains any characters outside the range of U+0020 to U+007E, then set relativeContentType to the empty string and return from these substeps.

      2. Convert every character in relativeContentType to ASCII lowercase.

  5. Let span be max((relativeEnd - relativeStart), 0).

  6. Return a new Blob object S with the following characteristics:

    1. S refers to span consecutive bytes from blob’s associated byte sequence, beginning with the byte at byte-order position relativeStart.
    2. S.size = span.
    3. S.type = relativeContentType.

3. The Blob Interface and Binary Data

A Blob object refers to a byte sequence, and has a size attribute which is the total number of bytes in the byte sequence, and a type attribute, which is an ASCII-encoded string in lower case representing the media type of the byte sequence.

Each Blob must have an internal snapshot state, which must be initially set to the state of the underlying storage, if any such underlying storage exists. Further normative definition of snapshot state can be found for Files.

[Exposed=(Window,Worker), Serializable]
interface Blob {
  constructor(optional sequence<BlobPart> blobParts,
              optional BlobPropertyBag options = {});

  readonly attribute unsigned long long size;
  readonly attribute DOMString type;

  // slice Blob into byte-ranged chunks
  Blob slice(optional [Clamp] long long start,
            optional [Clamp] long long end,
            optional DOMString contentType);

  // read from the Blob.
  [NewObject] ReadableStream stream();
  [NewObject] Promise<USVString> text();
  [NewObject] Promise<ArrayBuffer> arrayBuffer();
  [NewObject] Promise<Uint8Array> bytes();
};

enum EndingType { "transparent", "native" };

dictionary BlobPropertyBag {
  DOMString type = "";
  EndingType endings = "transparent";
};

typedef (BufferSource or Blob or USVString) BlobPart;

Blob objects are serializable objects. Their serialization steps, given value and serialized, are:

  1. Set serialized.[[SnapshotState]] to value’s snapshot state.

  2. Set serialized.[[ByteSequence]] to value’s underlying byte sequence.

Their deserialization step, given serialized and value, are:

  1. Set value’s snapshot state to serialized.[[SnapshotState]].

  2. Set