V8 Intro

How does Chrome execute JavaScript?

Liam Wachter, Julian Gremminger

Karlsruhe Institute of Technology

About Us

Julian Gremminger

KIT Computer Science Student
pwn/rev main CTF player @ KITCTF and orgakraut
Found one V8 bug and counting

Liam Wachter

KIT Computer Science Student
rev/pwn main CTF player @ KITCTF, polygl0ts, orga{niser,kraut}
Uses ML for vulnerability discovery @ SAP Security Research

What happens when Chromium displays sadsmileyface.xyz?

Reusable part of Chromium, e.g. in Edge or Android Webview
Blink is the rendering engine, forked from WebKit, forked from KDE KJS
gin is a wrapper around the V8 API with focus on C++-JS bindings
But also heavily used directly, primarily in //renderer/bindings
DOM is an observed object in V8 FunctionTemplate and Object Template

HTML parsing

Parsed two times, fast partial approximate, slow exact

Used to be source of more vulnz, but still has some

V8

🎉 Mostly unsafe C++ 🎉
Interpreter, baseline compiler, 2 optimizing compilers
> 2M LOC

V8 is a stand alone library

E.g. also used by Node.js

One instance of V8 operates on one v8::Isolate
An isolate can have multiple v8::Contexts
//renderer/bindings auto generates C++ V8 API code for DOM interaction, from WebIDL

[CustomToV8]
interface Node {
    const unsigned short ELEMENT_NODE = 1;
    attribute Node parentNode;
    [TreatReturnedNullStringAs=Null] attribute DOMString nodeName;
    [Custom] Node appendChild(Node newChild);
    void addEventListener(DOMString type, EventListener listener,
                          optional boolean useCapture);
};

What does V8 do with this?

function printA(obj) {
    console.log(obj.a);
}

for (let i = 0; i < 10000; i++) {
    printA({a: i});
}

Parsing

V8 gets the script as text (or as StreamedSource)

Hand written recursive descent parser

Separate passes for AST building and scope analysis (hard)


--- AST ---
FUNC at 0
. KIND 0
. LITERAL ID 0
. SUSPEND COUNT 0
. NAME ""
. INFERRED NAME ""
. DECLS
. . FUNCTION "printA" = function printA

Interpreter: Ignition

Unoptimized lowering to bytecode
Register machine with implicit accumulator executes bytecode
One instruction dispatches the next

IGNITION_HANDLER(Star, InterpreterAssembler) {
  TNode<Object> accumulator = GetAccumulator();
  StoreRegisterAtOperandIndex(accumulator, 0);
  Dispatch();
}

void InterpreterAssembler::Dispatch() {
  Comment("========= Dispatch");
  DCHECK_IMPLIES(Bytecodes::MakesCallAlongCriticalPath(bytecode_), made_call_);
  TNode<IntPtrT> target_offset = Advance();
  TNode<WordT> target_bytecode = LoadBytecode(target_offset);
  DispatchToBytecodeWithOptionalStarLookahead(target_bytecode);
}

Insanely long call chain to arrive at the code doing the actual work

Feedback

What kind of types do we encounter during execution?

function printAx2(obj) {
    console.log(obj.a + obj.a); // <-- int + int
}

for (let i = 0; i < 10000; i++) {
    printAx2({a: i});
}


function printAx2(obj) {
    console.log(obj.a + obj.a); // <-- str + str
}

for (let i = 0; i < 10000; i++) {
    printAx2({a: i.toString()});
}

Speculate based on the type feedback we collect
Assume that future execution will also have those types

Inline Caching


function printAx2(obj) {
    console.log(obj.a + obj.a); // <-- int + int
}

for (let i = 0; i < 10000; i++) {
    printAx2({a: i});
}

Monomorphic

In bytecode this is:


GetProperty(obj, "a", feedback_cache)

Inline Caching


function printAx2(obj) {
    console.log(obj.a + obj.a); // <-- int + int | str + str
}

for (let i = 0; i < 10000; i++) {
    printAx2({a: i});
}
for (let i = 0; i < 10000; i++) {
    printAx2({a: i.toString()});
}

Polymorphic

Inline Caching


function printAx2(obj) {
    console.log(obj.a + obj.a);
}

printAx2({a: 42});
printAx2({a: 0.23});
printAx2({a: "asdf"});
printAx2({a: {lmao: 7}});                                    
printAx2({a: BigInt(1337)});

Megamorphic

🔥 Hotness 🔥

Functions that get executed a lot with the same feedback get hot 🔥

for (let i = 0; i < 10000; i++) {
    printA({a: i});
}

Consider compilation time vs. execution speeds
Tier-up (further optimize) hot functions

Bailouts / Deoptimization

What happens when assumptions are broken?

function printAx2(obj) {
    console.log(obj.a + obj.a);
}

// optimize printAx2
for (let i = 0; i < 10000; i++) {
    printAx2({a: i}); // <-- int
}

printAx2({a: "asdf"}); // <-- str

Optimized code is not correct anymore!

Sparkplug

Baseline compiler
No speculation, no feedback, no optimization
Generates machine code

__ CodeEntry();
{
    RCS_BASELINE_SCOPE(Visit);
    Prologue();
    AddPosition();
    for (; !iterator_.done(); iterator_.Advance()) {
      VisitSingleBytecode();
      AddPosition();
    }
}

void BaselineCompiler::VisitLdaZero() {
  __ Move(kInterpreterAccumulatorRegister,
          Smi::FromInt(0));
}

void BaselineAssembler::Move(
        interpreter::Register output, 
        Register source) {
  return __ movq(RegisterFrameOperand(output),
                 source);
}

Maglev

New mid-tier compiler
Generates somewhat optimized machine code
Not a lot of optimization passes (yet)
Inlining, direct lowering

Direct lowering

void MaglevGraphBuilder::VisitAdd() { VisitBinaryOperation<Operation::kAdd>(); }

template <Operation kOperation>
void MaglevGraphBuilder::VisitBinaryOperation() {
  ...
  switch (feedback_hint) {
    case BinaryOperationHint::kNone:
      RETURN_VOID_ON_ABORT(EmitUnconditionalDeopt(
          DeoptimizeReason::kInsufficientTypeFeedbackForBinaryOperation));
    case BinaryOperationHint::kSignedSmall:
    case BinaryOperationHint::kSignedSmallInputs:
    case BinaryOperationHint::kNumber:
    case BinaryOperationHint::kNumberOrOddball: {
      ... // Optimized for number operation
    }
    case BinaryOperationHint::kString:
      ... // Optimized for string operation
    }
  }

Turbofan

Compiles one function at a time
SSA¹ + graph-based optimization pipeline
Highly specific, optimized code
Relies on feedback from lower tiers

function f(obj, i) {
    if (i % 2 == 0) {
        obj.a = obj.a + 1;
    }
}

Sea of nodes
Value, control, effect edges
Effect edges order stateful operations and track side effects!

Insanely² complex
🚀 Bugs (were?) everywhere 🚀
Turboshaft (new Turbofan IR) brings more bugs

What happens when side effects are modeled incorrectly?

CVE-2018-17463

- V(CreateObject, Operator::kNoWrite, 1, 1)                            \
+ V(CreateObject, Operator::kNoProperties, 1, 1)                       \

Object.create did not write to the side effect chain but could have had side effects! Incorrect removal of important checks!

Optimizations

Most passes consist of n fixpoint iteration reducers on the graph

struct TypedLoweringPhase {
  DECL_PIPELINE_PHASE_CONSTANTS(TypedLowering)
  void Run(PipelineData* data, Zone* temp_zone) {
        ...
        AddReducer(data, &graph_reducer, &dead_code_elimination);
        AddReducer(data, &graph_reducer, &constant_folding_reducer);
        AddReducer(data, &graph_reducer, &simple_reducer);
        AddReducer(data, &graph_reducer, &common_reducer);
        ...
        graph_reducer.ReduceGraph();
    }
}

DeadCodeElimination, ConstantFoldingReducer, MachineOperatorReducer, ...

Example: MachineOperatorReducer

Reduction MachineOperatorReducer::Reduce(Node* node) {
    ...
    case IrOpcode::kInt32Mul: {
      Int32BinopMatcher m(node);
      if (m.right().Is(0)) return Replace(m.right().node());  // x * 0 => 0
      if (m.right().Is(1)) return Replace(m.left().node());   // x * 1 => x
      if (m.right().Is(-1)) {  // x * -1 => 0 - x
        node->ReplaceInput(0, Int32Constant(0));
        node->ReplaceInput(1, m.left().node());
        NodeProperties::ChangeOp(node, machine()->Int32Sub());
        return Changed(node);
      }
      ...
    }
}

Phase examples: Inlining

Classic optimization
Based on corresponding bytecode size and call frequency (phi call sites)

function g(i: int) {
    return i & 3;
}

function f(i: int) {
    if (g(i) <= 3) {
        i = 0;
    }
    return i;
}

function f(i: int) {
    if (i & 3 <= 3) {
        i = 0;
    }
    return i;
}

Enables strong optimization possibilities

function f(i: int) {
    return 0;
}

Phase examples: Typer

Ran early on to associate types with nodes
Internal types (differ from JS "types")
Range(0,1), Signed31, HeapConstant
Further optimizations can consider and refine possible type information

function f(i) {
    return i & 3;
}

Phase examples: Escape analysis

Created objects that do not escape a function can possibly be optimized out
Special consideration needed during bailouts => Rematerialization

function f() {
    var obj = {a: 1337};
    console.log(obj.a);
}

function f() {
    console.log(1337);
}

Compiler pipeline summary

Prevalent bug classes

Typer bugs

Off-by-one in typing of `String.indexOf`

Type Typer::Visitor::JSCallTyper(Type fun, Typer* t) {
...
case kStringLastIndexOf:
    return Type::Range(-1.0, String::kMaxLength - 1.0, t->zone());

Type Typer::Visitor::JSCallTyper(Type fun, Typer* t) {
...
case kStringLastIndexOf:
    return Type::Range(-1.0, String::kMaxLength - 1.0, t->zone());

const kMaxLength = 0x1fffffe8;

function hax() {
    let s = "_".repeat(kMaxLength);
    let bad = s.indexOf("", kMaxLength); // Type = (-1.0, 0x1fffffe7) | Actual = 0x1fffffe8!
    ...
}

Mismatch between assumed type and actual value

Abusing this for RCE requires tricking turbofan into incorrectly removing some checks
Typer hardening mitigation tries to make this impossible
Mitigated bypasses include removal of bounds-check elimination, OOB with Array.pop(), Array iterator .next()

Exploiting typer bugs on up-to-date chrome for RCE now requires finding an additional typer hardening bypass 😢

https://bugs.chromium.org/p/chromium/issues/detail?id=1423487 (still restricted!)

Typer bugs

Math.expm1 missing -0 in deduced type
Invalid induction variable typing with +-Infinity
Many more

Hole leak

Sentinel value in V8 (and some other javascript engines) representing an empty slot (hole) in an array, map or set
Leaking this internal value to javascript resulted in RCE

var arr = [1.1, 2.2, , , 5.5];
arr[2]; // internally represented by TheHole

Currently very hot topic

Recently found in-the-wild bugs

CVE-2021-38003
CVE-2023-2033 (exploited in-the-wild)
CVE-2023-3079 (exploited in-the-wild)

TheHole 2 RCE?

Public (old) way via Map.set() + Map.delete() (now patched)

Mistery how the new ITW bugs were exploited
Probably by tricking turbofan again (no public details yet)!
A lot of work is done trying to mitigate this currently

Type confusions

Somewhat broad class
Dictionary vs. inline object representation (CVE-2018-17463)
Confusion with arbitary JS object (CVE-2023-2935)
Map transistions that are not properly catched and handled (crbug.com/944062)

Maps

Every JS object is associated with a map (shape/structure in Spidermonkey/WebKit)
Contains information about which properties an objects holds and how they are stored
Type confusions if map is not properly updated after changes to the object

Exploitation

What primitives do we want?

(Small) out-of-bounds read on the V8 heap is enough* for RCE
Craft out-of-bounds write + addrof with OOB read
Get there by corrupting length of an array

How though?

Hole leakers (mitigated)
Depends on the bug
We can often trick turbofan into removing checks based on false assumptions we setup with our bug

addrof + fakeobj

addrof = getaddrOf();
fakeObj = getfakeObj();

var testObj = {a: 1337, b: 420};

if (!fakeObj(addrof(testObj)) === testObj) throw "fakeObj/addrof bricked!";
print("fakeobj/addrof working as expected!");

Get address of arbitary javascript objects
Fake arbitary javascript objects

addrof

RCE?

New V8 sandbox tries to mitigate RCE even with full read/write on the V8 heap

~~Arbitary read/writes by overwriting TypedArray backing store~~
~~JIT spraying~~
~~ROP through JIT/WASM code pointers~~
Patch bytecode of functions (probably getting killed soon)

Enough talk, bugs where?

References

[1] Kalvin Lee. Content module. Available at: https://github.com/chromium/chromium/tree/main/content. Accessed: 2023-07-24.
[2] Ruiz-Henríquez, Ovidio de Jesús; Abd-El-Malek, John. Anatomy of the browser 201. Available at: https://youtu.be/u7berRU9Qys. Accessed: 2023-07-24.
[3] Shu-yu Guo. Life of a Script. Available at: https://youtu.be/veYjbF1rt5o. Accessed: 2023-07-24.
[4] John Gruber, Leszek Swirski, Toon Verwaest. Maglev. Available at: https://docs.google.com/document/d/13CwgSL4yawxuYg3iNlM-4ZPCB8RgJya6b8H_E2F-Aek. Accessed: 2023-07-26.

V8 Intro

How does Chrome execute JavaScript?

About Us

Julian Gremminger

Liam Wachter

HTML parsing

V8

V8 is a stand alone library

Parsing

Interpreter: Ignition

Feedback

Inline Caching

Inline Caching

Inline Caching

🔥 Hotness 🔥

Bailouts / Deoptimization

Sparkplug

Maglev

Direct lowering

Turbofan

Optimizations

Example: MachineOperatorReducer

Phase examples: Inlining

Phase examples: Typer

Phase examples: Escape analysis

Compiler pipeline summary

Prevalent bug classes

Typer bugs

Off-by-one in typing of String.indexOf

Typer bugs

Hole leak

TheHole 2 RCE?

Type confusions

Maps

Exploitation

What primitives do we want?

addrof + fakeobj

addrof

RCE?

Enough talk, bugs where?

References

Off-by-one in typing of `String.indexOf`