When createAnswer is called for the first time after a remote description has been provided, the result is known as the initial answer. If no remote description has been installed, an answer cannot be generated, and an error MUST be returned.

Note that the remote description SDP may not have been created by a JSEP endpoint and may not conform to all the requirements listed in Section 5.2. For many cases, this is not a problem. However, if any mandatory SDP attributes are missing or functionality listed as mandatory-to-use above is not present, this MUST be treated as an error and MUST cause the affected "m=" sections to be marked as rejected.

The first step in generating an initial answer is to generate session-level attributes. The process here is identical to that indicated in Section 5.2.1 above, except that the "a=ice-options" line, with the "trickle" option as specified in [RFC8840], Section 4.1.3 and the "ice2" option as specified in [RFC8445], Section 10, is only included if such an option was present in the offer.

The next step is to generate session-level lip sync groups, as defined in [RFC5888], Section 7. For each group of type "LS" present in the offer, select the local RtpTransceivers that are referenced by the MID values in the specified group, and determine which of them either reference a common local MediaStream (specified in the calls to addTrack/addTransceiver used to create them) or have no MediaStream to reference because they were not created by addTrack/addTransceiver. If at least two such RtpTransceivers exist, a group of type "LS" with the MID values of these RtpTransceivers MUST be added. Otherwise, the offered "LS" group MUST be ignored and no corresponding group generated in the answer.

As a simple example, consider the following offer of a single audio and single video track contained in the same MediaStream. SDP lines not relevant to this example have been removed for clarity. As explained in Section 5.2, a group of type "LS" has been added that references each track's RtpTransceiver.

          a=group:LS a1 v1
          m=audio 10000 UDP/TLS/RTP/SAVPF 0
          a=mid:a1
          a=msid:ms1
          m=video 10001 UDP/TLS/RTP/SAVPF 96
          a=mid:v1
          a=msid:ms1

If the answerer uses a single MediaStream when it adds its tracks, both of its transceivers will reference this stream, and so the subsequent answer will contain a "LS" group identical to that in the offer, as shown below:

          a=group:LS a1 v1
          m=audio 20000 UDP/TLS/RTP/SAVPF 0
          a=mid:a1
          a=msid:ms2
          m=video 20001 UDP/TLS/RTP/SAVPF 96
          a=mid:v1
          a=msid:ms2

However, if the answerer groups its tracks into separate MediaStreams, its transceivers will reference different streams, and so the subsequent answer will not contain a "LS" group.

          m=audio 20000 UDP/TLS/RTP/SAVPF 0
          a=mid:a1
          a=msid:ms2a
          m=video 20001 UDP/TLS/RTP/SAVPF 96
          a=mid:v1
          a=msid:ms2b

Finally, if the answerer does not add any tracks, its transceivers will not reference any MediaStreams, causing the preferences of the offerer to be maintained, and so the subsequent answer will contain an identical "LS" group.

          a=group:LS a1 v1
          m=audio 20000 UDP/TLS/RTP/SAVPF 0
          a=mid:a1
          a=recvonly
          m=video 20001 UDP/TLS/RTP/SAVPF 96
          a=mid:v1
          a=recvonly

The example in Section 7.2 shows a more involved case of "LS" group generation.

The next step is to generate an "m=" section for each "m=" section that is present in the remote offer, as specified in [RFC3264], Section 6. For the purposes of this discussion, any session-level attributes in the offer that are also valid as media-level attributes are considered to be present in each "m=" section. Each offered "m=" section will have an associated RtpTransceiver, as described in Section 5.10. If there are more RtpTransceivers than there are "m=" sections, the unmatched RtpTransceivers will need to be associated in a subsequent offer.

For each offered "m=" section, if any of the following conditions are true, the corresponding "m=" section in the answer MUST be marked as rejected by setting the <port> in the "m=" line to zero, as indicated in [RFC3264], Section 6, and further processing for this "m=" section can be skipped:

  • The associated RtpTransceiver has been stopped.
  • There is no offered media format that is both supported and, if applicable, allowed by codec preferences.
  • The bundle policy is "must-bundle", and this is not the first "m=" section or in the same bundle group as the first "m=" section.
  • The bundle policy is "balanced", and this is not the first "m=" section for this media type or in the same bundle group as the first "m=" section for this media type.
  • This "m=" section is in a bundle group, and the group's offerer tagged "m=" section is being rejected due to one of the above reasons. This requires all "m=" sections in the bundle group to be rejected, as specified in [RFC9143], Section 7.3.3.

Otherwise, each "m=" section in the answer MUST then be generated as specified in [RFC3264], Section 6.1. For the "m=" line itself, the following rules MUST be followed:

  • The <port> value would normally be set to the port of the default ICE candidate for this "m=" section, but given that no candidates are available yet, the default <port> value of 9 (Discard) MUST be used, as indicated in [RFC8840], Section 4.1.1.
  • The <proto> field MUST be set to exactly match the <proto> field for the corresponding "m=" line in the offer.
  • If codec preferences have been set for the associated transceiver, media formats MUST be generated in the corresponding order, regardless of what was offered, and MUST exclude any codecs not present in the codec preferences.
  • Otherwise, the media formats on the "m=" line MUST be generated in the same order as those offered in the current remote description, excluding any currently unsupported formats. Any currently available media formats that are not present in the current remote description MUST be added after all existing formats.
  • In either case, the media formats in the answer MUST include at least one format that is present in the offer but MAY include formats that are locally supported but not present in the offer, as mentioned in [RFC3264], Section 6.1. If no common format exists, the "m=" section is rejected as described above.

The "m=" line MUST be followed immediately by a "c=" line, as specified in [RFC4566], Section 5.7. Again, as no candidates are available yet, the "c=" line MUST contain the default value "IN IP4 0.0.0.0", as defined in [RFC8840], Section 4.1.3.

If the offer supports bundle, all "m=" sections to be bundled MUST use the same ICE credentials and candidates; all "m=" sections not being bundled MUST use unique ICE credentials and candidates. Each "m=" section MUST contain the following attributes (which are of attribute types other than IDENTICAL or TRANSPORT):

Each "m=" section that is not bundled into another "m=" section MUST contain the following attributes (which are of category IDENTICAL or TRANSPORT):

  • "a=ice-ufrag" and "a=ice-pwd" lines, as specified in [RFC8839], Section 5.4.
  • For each desired digest algorithm, one or more "a=fingerprint" lines for each of the endpoint's certificates, as specified in [RFC8122], Section 5.
  • An "a=setup" line, as specified in [RFC4145], Section 4 and clarified for use in DTLS-SRTP scenarios in [RFC5763], Section 5. The role value in the answer MUST be "active" or "passive". When the offer contains the "actpass" value, as will always be the case with JSEP endpoints, the answerer SHOULD use the "active" role. Offers from non-JSEP endpoints MAY send other values for "a=setup", in which case the answer MUST use a value consistent with the value in the offer.
  • An "a=tls-id" line, as specified in [RFC8842], Section 5.3.
  • If present in the offer, an "a=rtcp-mux" line, as specified in [RFC5761], Section 5.1.3. Otherwise, an "a=rtcp" line, as specified in [RFC3605], Section 2.1, containing the default value "9 IN IP4 0.0.0.0" (because no candidates have yet been gathered).
  • If present in the offer, an "a=rtcp-rsize" line, as specified in [RFC5506], Section 5.

If a data channel "m=" section has been offered, an "m=" section MUST also be generated for data. The <media> field MUST be set to "application", and the <proto> and <fmt> fields MUST be set to exactly match the fields in the offer.

Within the data "m=" section, an "a=mid" line MUST be generated and included as described above, along with an "a=sctp-port" line referencing the SCTP port number, as defined in [RFC8841], Section 5.1; and, if appropriate, an "a=max-message-size" line, as defined in [RFC8841], Section 6.1.

As discussed above, the following attributes of category IDENTICAL or TRANSPORT are included only if the data "m=" section is not bundled into another "m=" section:

  • "a=ice-ufrag"
  • "a=ice-pwd"
  • "a=fingerprint"
  • "a=setup"
  • "a=tls-id"

Note that if media "m=" sections are bundled into a data "m=" section, then certain TRANSPORT and IDENTICAL attributes may also appear in the data "m=" section even if they would otherwise only be appropriate for a media "m=" section (e.g., "a=rtcp-mux").

If "a=group" attributes with semantics "BUNDLE" are offered, corresponding session-level "a=group" attributes MUST be added as specified in [RFC5888]. These attributes MUST have semantics "BUNDLE" and MUST include all MID identifiers from the offered bundle groups that have not been rejected. Note that regardless of the presence of "a=bundle-only" in the offer, all "m=" sections in the answer MUST NOT have an "a=bundle-only" line.

Attributes that are common between all "m=" sections MAY be moved to the session level if explicitly defined to be valid at the session level.

The attributes prohibited in the creation of offers are also prohibited in the creation of answers.