V8 Intro

How does Chrome execute JavaScript?

Liam Wachter, Julian Gremminger

Karlsruhe Institute of Technology

About Us

Julian Gremminger

KIT Computer Science Student
pwn/rev main CTF player @ KITCTF and orgakraut
Found one V8 bug and counting

Liam Wachter

KIT Computer Science Student
rev/pwn main CTF player @ KITCTF, polygl0ts, orga{niser,kraut}
Uses ML for vulnerability discovery @ SAP Security Research

What happens when Chromium displays sadsmileyface.xyz?
  • Reusable part of Chromium, e.g. in Edge or Android Webview
  • Blink is the rendering engine, forked from WebKit, forked from KDE KJS
  • gin is a wrapper around the V8 API with focus on C++-JS bindings
  • But also heavily used directly, primarily in //renderer/bindings
  • DOM is an observed object in V8 FunctionTemplate and Object Template

HTML parsing

Parsed two times, fast partial approximate, slow exact

Used to be source of more vulnz, but still has some

V8

  • 🎉 Mostly unsafe C++ 🎉
  • Interpreter, baseline compiler, 2 optimizing compilers
  • > 2M LOC

V8 is a stand alone library

E.g. also used by Node.js

  • One instance of V8 operates on one v8::Isolate
  • An isolate can have multiple v8::Contexts
  • //renderer/bindings auto generates C++ V8 API code for DOM interaction, from WebIDL
  • [CustomToV8]
    interface Node {
        const unsigned short ELEMENT_NODE = 1;
        attribute Node parentNode;
        [TreatReturnedNullStringAs=Null] attribute DOMString nodeName;
        [Custom] Node appendChild(Node newChild);
        void addEventListener(DOMString type, EventListener listener,
                              optional boolean useCapture);
    };
                        
What does V8 do with this?
function printA(obj) {
    console.log(obj.a);
}

for (let i = 0; i < 10000; i++) {
    printA({a: i});
}

Parsing

V8 gets the script as text (or as StreamedSource)

Hand written recursive descent parser

Separate passes for AST building and scope analysis (hard)


--- AST ---
FUNC at 0
. KIND 0
. LITERAL ID 0
. SUSPEND COUNT 0
. NAME ""
. INFERRED NAME ""
. DECLS
. . FUNCTION "printA" = function printA

                    

Interpreter: Ignition

  • Unoptimized lowering to bytecode
  • Register machine with implicit accumulator executes bytecode
  • One instruction dispatches the next
IGNITION_HANDLER(Star, InterpreterAssembler) {
  TNode<Object> accumulator = GetAccumulator();
  StoreRegisterAtOperandIndex(accumulator, 0);
  Dispatch();
}
void InterpreterAssembler::Dispatch() {
  Comment("========= Dispatch");
  DCHECK_IMPLIES(Bytecodes::MakesCallAlongCriticalPath(bytecode_), made_call_);
  TNode<IntPtrT> target_offset = Advance();
  TNode<WordT> target_bytecode = LoadBytecode(target_offset);
  DispatchToBytecodeWithOptionalStarLookahead(target_bytecode);
}

Insanely long call chain to arrive at the code doing the actual work

Feedback

What kind of types do we encounter during execution?

function printAx2(obj) {
    console.log(obj.a + obj.a); // <-- int + int
}

for (let i = 0; i < 10000; i++) {
    printAx2({a: i});
}

function printAx2(obj) {
    console.log(obj.a + obj.a); // <-- str + str
}

for (let i = 0; i < 10000; i++) {
    printAx2({a: i.toString()});
}
  • Speculate based on the type feedback we collect
  • Assume that future execution will also have those types

Inline Caching


function printAx2(obj) {
    console.log(obj.a + obj.a); // <-- int + int
}

for (let i = 0; i < 10000; i++) {
    printAx2({a: i});
}

Monomorphic

In bytecode this is:


GetProperty(obj, "a", feedback_cache)

Inline Caching


function printAx2(obj) {
    console.log(obj.a + obj.a); // <-- int + int | str + str
}

for (let i = 0; i < 10000; i++) {
    printAx2({a: i});
}
for (let i = 0; i < 10000; i++) {
    printAx2({a: i.toString()});
}

Polymorphic

Inline Caching


function printAx2(obj) {
    console.log(obj.a + obj.a);
}

printAx2({a: 42});
printAx2({a: 0.23});
printAx2({a: "asdf"});
printAx2({a: {lmao: 7}});                                    
printAx2({a: BigInt(1337)});                                    

Megamorphic

🔥 Hotness 🔥

Functions that get executed a lot with the same feedback get hot 🔥

for (let i = 0; i < 10000; i++) {
    printA({a: i});
}
  • Consider compilation time vs. execution speeds
  • Tier-up (further optimize) hot functions

Bailouts / Deoptimization

What happens when assumptions are broken?

function printAx2(obj) {
    console.log(obj.a + obj.a);
}

// optimize printAx2
for (let i = 0; i < 10000; i++) {
    printAx2({a: i}); // <-- int
}

printAx2({a: "asdf"}); // <-- str

Optimized code is not correct anymore!

Sparkplug

  • Baseline compiler
  • No speculation, no feedback, no optimization
  • Generates machine code
__ CodeEntry();
{
    RCS_BASELINE_SCOPE(Visit);
    Prologue();
    AddPosition();
    for (; !iterator_.done(); iterator_.Advance()) {
      VisitSingleBytecode();
      AddPosition();
    }
}
void BaselineCompiler::VisitLdaZero() {
  __ Move(kInterpreterAccumulatorRegister,
          Smi::FromInt(0));
}
void BaselineAssembler::Move(
        interpreter::Register output, 
        Register source) {
  return __ movq(RegisterFrameOperand(output),
                 source);
}

Maglev

  • New mid-tier compiler
  • Generates somewhat optimized machine code
  • Not a lot of optimization passes (yet)
  • Inlining, direct lowering

Direct lowering

void MaglevGraphBuilder::VisitAdd() { VisitBinaryOperation<Operation::kAdd>(); }
template <Operation kOperation>
void MaglevGraphBuilder::VisitBinaryOperation() {
  ...
  switch (feedback_hint) {
    case BinaryOperationHint::kNone:
      RETURN_VOID_ON_ABORT(EmitUnconditionalDeopt(
          DeoptimizeReason::kInsufficientTypeFeedbackForBinaryOperation));
    case BinaryOperationHint::kSignedSmall:
    case BinaryOperationHint::kSignedSmallInputs:
    case BinaryOperationHint::kNumber:
    case BinaryOperationHint::kNumberOrOddball: {
      ... // Optimized for number operation
    }
    case BinaryOperationHint::kString:
      ... // Optimized for string operation
    }
  }

Turbofan


  • Compiles one function at a time
  • SSA¹ + graph-based optimization pipeline
  • Highly specific, optimized code
  • Relies on feedback from lower tiers
function f(obj, i) {
    if (i % 2 == 0) {
        obj.a = obj.a + 1;
    }
}
  • Sea of nodes
  • Value, control, effect edges
  • Effect edges order stateful operations and track side effects!
  • Insanely² complex
  • 🚀 Bugs (were?) everywhere 🚀
  • Turboshaft (new Turbofan IR) brings more bugs

What happens when side effects are modeled incorrectly?

CVE-2018-17463
- V(CreateObject, Operator::kNoWrite, 1, 1)                            \
+ V(CreateObject, Operator::kNoProperties, 1, 1)                       \
                
Object.create did not write to the side effect chain but could have had side effects! Incorrect removal of important checks!

Optimizations

Most passes consist of n fixpoint iteration reducers on the graph

struct TypedLoweringPhase {
  DECL_PIPELINE_PHASE_CONSTANTS(TypedLowering)
  void Run(PipelineData* data, Zone* temp_zone) {
        ...
        AddReducer(data, &graph_reducer, &dead_code_elimination);
        AddReducer(data, &graph_reducer, &constant_folding_reducer);
        AddReducer(data, &graph_reducer, &simple_reducer);
        AddReducer(data, &graph_reducer, &common_reducer);
        ...
        graph_reducer.ReduceGraph();
    }
}

DeadCodeElimination, ConstantFoldingReducer, MachineOperatorReducer, ...

Example: MachineOperatorReducer

Reduction MachineOperatorReducer::Reduce(Node* node) {
    ...
    case IrOpcode::kInt32Mul: {
      Int32BinopMatcher m(node);
      if (m.right().Is(0)) return Replace(m.right().node());  // x * 0 => 0
      if (m.right().Is(1)) return Replace(m.left().node());   // x * 1 => x
      if (m.right().Is(-1)) {  // x * -1 => 0 - x
        node->ReplaceInput(0, Int32Constant(0));
        node->ReplaceInput(1, m.left().node());
        NodeProperties::ChangeOp(node, machine()->Int32Sub());
        return Changed(node);
      }
      ...
    }
}

Phase examples: Inlining

  • Classic optimization
  • Based on corresponding bytecode size and call frequency (phi call sites)
  • function g(i: int) {
        return i & 3;
    }
    
    function f(i: int) {
        if (g(i) <= 3) {
            i = 0;
        }
        return i;
    }
    

function f(i: int) {
    if (i & 3 <= 3) {
        i = 0;
    }
    return i;
}
Enables strong optimization possibilities
function f(i: int) {
    return 0;
}

Phase examples: Typer

  • Ran early on to associate types with nodes
  • Internal types (differ from JS "types")
  • Range(0,1), Signed31, HeapConstant
  • Further optimizations can consider and refine possible type information

function f(i) {
    return i & 3;
}

Phase examples: Escape analysis

  • Created objects that do not escape a function can possibly be optimized out
  • Special consideration needed during bailouts => Rematerialization
function f() {
    var obj = {a: 1337};
    console.log(obj.a);
}
                    
function f() {
    console.log(1337);
}

Compiler pipeline summary

Prevalent bug classes

Typer bugs

Off-by-one in typing of String.indexOf

Type Typer::Visitor::JSCallTyper(Type fun, Typer* t) {
...
case kStringLastIndexOf:
    return Type::Range(-1.0, String::kMaxLength - 1.0, t->zone());
Type Typer::Visitor::JSCallTyper(Type fun, Typer* t) {
...
case kStringLastIndexOf:
    return Type::Range(-1.0, String::kMaxLength - 1.0, t->zone());
const kMaxLength = 0x1fffffe8;

function hax() {
    let s = "_".repeat(kMaxLength);
    let bad = s.indexOf("", kMaxLength); // Type = (-1.0, 0x1fffffe7) | Actual = 0x1fffffe8!
    ...
}
Mismatch between assumed type and actual value
  • Abusing this for RCE requires tricking turbofan into incorrectly removing some checks
  • Typer hardening mitigation tries to make this impossible
  • Mitigated bypasses include removal of bounds-check elimination, OOB with Array.pop(), Array iterator .next()
Exploiting typer bugs on up-to-date chrome for RCE now requires finding an additional typer hardening bypass 😢

https://bugs.chromium.org/p/chromium/issues/detail?id=1423487 (still restricted!)

Typer bugs

  • Math.expm1 missing -0 in deduced type
  • Invalid induction variable typing with +-Infinity
  • Many more

Hole leak

  • Sentinel value in V8 (and some other javascript engines) representing an empty slot (hole) in an array, map or set
  • Leaking this internal value to javascript resulted in RCE
var arr = [1.1, 2.2, , , 5.5];
arr[2]; // internally represented by TheHole

Currently very hot topic

Recently found in-the-wild bugs

  • CVE-2021-38003
  • CVE-2023-2033 (exploited in-the-wild)
  • CVE-2023-3079 (exploited in-the-wild)

TheHole 2 RCE?

Public (old) way via Map.set() + Map.delete() (now patched)

  • Mistery how the new ITW bugs were exploited
  • Probably by tricking turbofan again (no public details yet)!
  • A lot of work is done trying to mitigate this currently

Type confusions

  • Somewhat broad class
  • Dictionary vs. inline object representation (CVE-2018-17463)
  • Confusion with arbitary JS object (CVE-2023-2935)
  • Map transistions that are not properly catched and handled (crbug.com/944062)

Maps

  • Every JS object is associated with a map (shape/structure in Spidermonkey/WebKit)
  • Contains information about which properties an objects holds and how they are stored
  • Type confusions if map is not properly updated after changes to the object

Exploitation

What primitives do we want?

  • (Small) out-of-bounds read on the V8 heap is enough* for RCE
  • Craft out-of-bounds write + addrof with OOB read
  • Get there by corrupting length of an array
How though?
  • Hole leakers (mitigated)
  • Depends on the bug
  • We can often trick turbofan into removing checks based on false assumptions we setup with our bug

addrof + fakeobj

addrof = getaddrOf();
fakeObj = getfakeObj();

var testObj = {a: 1337, b: 420};

if (!fakeObj(addrof(testObj)) === testObj) throw "fakeObj/addrof bricked!";
print("fakeobj/addrof working as expected!");
  • Get address of arbitary javascript objects
  • Fake arbitary javascript objects

addrof

RCE?

New V8 sandbox tries to mitigate RCE even with full read/write on the V8 heap
  • Arbitary read/writes by overwriting TypedArray backing store
  • JIT spraying
  • ROP through JIT/WASM code pointers
  • Patch bytecode of functions (probably getting killed soon)

Enough talk, bugs where?

References

[1] Kalvin Lee. Content module. Available at: https://github.com/chromium/chromium/tree/main/content. Accessed: 2023-07-24.
[2] Ruiz-Henríquez, Ovidio de Jesús; Abd-El-Malek, John. Anatomy of the browser 201. Available at: https://youtu.be/u7berRU9Qys. Accessed: 2023-07-24.
[3] Shu-yu Guo. Life of a Script. Available at: https://youtu.be/veYjbF1rt5o. Accessed: 2023-07-24.
[4] John Gruber, Leszek Swirski, Toon Verwaest. Maglev. Available at: https://docs.google.com/document/d/13CwgSL4yawxuYg3iNlM-4ZPCB8RgJya6b8H_E2F-Aek. Accessed: 2023-07-26.