programming JavaScript Nov 28, 2016

JavaScript Engines Hidden Classes (and Why You Should Keep Them in Mind)

You are reading an outdated version of this article! Please read the latest version instead: JavaScript Engines Hidden Classes (and Why You Should Keep Them in Mind)

When V8 lead engineer Lars Bak¹ describes (opens new window) V8 design decisions, the first thing he talks about are hidden classes.

To better understand hidden classes we need to know what problem they solve. This article is a first introduction to this concept. It will:

cover fast property access because it's something hidden classes make possible,
explain what hidden classes are and where they come from,
tell you how you can observe them in their natural habitat because Chromium provides us with an awesome microscope,
~~say a few things about inline caching because here as well using hidden classes can lead to big performance boost,~~
~~show you why keeping them in mind can improve your JavaScript performance eg. by avoiding property deletion or polymorphic call-sites.~~²

Properties in an OO World

In a class-based object-oriented language such as C++, Java or C# in which you cannot add or delete methods and properties from an object on the fly, accessing properties is generally not costly. You can store object properties at fixed memory offsets because the object layout for an instance of a given class will never change.³ In this case, accessing a property can often be done with a single instruction: load the thing located at the given memory offset.

In an imaginary Java virtual machine, a Java object could be stored in memory as a simple structure which is the exact same for all instances of a same class. The properties (attributes, methods, …) are either primitive Java types (int, float, …) or pointers (to arrays, functions, …). This structure doesn't hold the "whole" object data, it merely holds references (memory offsets) to where the "real" data is stored. Or as another JVM could do, an object could be stored as three simple pointers: the first one to the class object representing the type of the object, the second one to a table holding pointers to the object's methods, the third one to the memory allocated for the object data.

At this point, you noticed I'm mainly talking about strategies to store an object in memory and access their properties. We call this property access which means retrieving the value of an object property. It's a mechanism we use all the time, here is how we leverage it in JavaScript:

const o = { // object
  f: (x) => parseInt(x, 13),
  1337: 13.37 + 20
};

o.f // property access => function f
o[1337] // property access => 33.37
o.f(o[1337]) // performs two property accesses and one function call => 42

1
2
3
4
5
6
7
8

As you may know, JavaScript is prototype-based (and class-free, not class-based).

Objects are mutable in JavaScript, so we can augment the new instances, giving them new fields and methods. These can then act as prototypes for even newer objects. We don't need classes to make lots of similar objects.
Douglas Crockford (opens new window)

Not only are objects created by "cloning" existing prototypes, they can also be created by using the literal notation (or initializer notation (opens new window)) which you then can just as easily modify on the fly.

An important part of running some JavaScript code is creating objects, putting them in memory and retrieving them or only retrieving some of their properties. We will focus here on ordinary JavaScript objects (opens new window) (as opposed to exotic objects (opens new window)) and discuss property access techniques.

The Property Access Problem: Dynamic Lookups are Slow

Let's get back to the topic: how can we implement property access? Looking at what the ECMAScript 2015 specification proposes is a good starting point.

Under section 9 Ordinary and Exotic Objects Behaviours (opens new window), 9.1.8 [[Get]] (opens new window) describes the following algorithm (simplified for the purpose of this article):

When we do obj[prop] or obj.prop, …

Make sure typeof prop is either 'string' or 'symbol'. (obj[13] does in fact obj['13'].)
If prop is a direct property of obj and obj[prop] is not undefined, return obj[prop]. End.
If prop is not a direct property of obj or if obj[prop] is undefined, then a. Let parent be obj's prototype b. Do the same as 2. using parent instead of obj. c. If parent is null, return undefined. End. d. Go down the prototype chain: go back to 1. but with parent instead of obj (which means it'll retry the same procedure with parent[prop]).

This is called dynamic lookup. This lookup is dynamic because at runtime we try to find prop on obj, if we fail we try the same on its prototype, then on the prototype's prototype, etc.

We could implement a (big) dictionary (or associative array) where you would store all the objects used by the program. Keys would be references to objects and the values would in turn be dictionaries with their properties as key and values:

// Your JavaScript code:
function Point(x, y) {
  this.x = x;
  this.y = y;
}
const p1 = new Point(12, 3);
const p2 = new Point(5, 9);
const p3 = new Point(12);
const lit = { name: 'Bond' };

// The JavaScript engine's dictionary storing the objects
const allObjects = {
  o0: { // p1
    __proto__: 'p6a1251', // Object{constructor: Point(x, y), __proto__: Object{constructor: Object()}}
    x: 12,
    y: 3
  },
  o1: { // p2
    __proto__: 'p6a1251', // Object{constructor: Point(x, y), __proto__: Object{constructor: Object()}}
    x: 5,
    y: 9
  },
  o2: { // p3
    __proto__: 'p6a1251', // Object{constructor: Point(x, y), __proto__: Object{constructor: Object()}}
    x: 12
  },
  o3: { // lit
    __proto__: 'p419ecc', // Object{constructor: Object()}
    name: 'Bond'
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

V8 is not implemented in JavaScript, but this should give you a basic idea of how we could store all the objects used in a JavaScript program using a dictionary. (This dictionary would probably be implemented as a hash table.)

Now let's say we want to get p1.z, p1 being the first object created in our program hence having o0 as reference. Let's roughly follow the algorithm we took from the ECMAScript spec:

Find o0 in allObjects (lookup to resolve the object's location in memory),
Find a property named "z" in o0 (dynamic lookup to resolve the property's location in memory) and return its value. (Should we have tried p1.x, we could have stopped here, returning 12.)
Since o0 does not have a property named "z", fetch o0.__proto__ to see if it has a property named "z", otherwise look if o0.__proto__.__proto__ has a property named "z", repeat this process down the prototype chain (opens new window) until __proto__ (opens new window) is null (which means no object prototype was found).

As you can probably guess, this process is not efficient.

The Property Access Solution: Hidden classes

Instead of resorting to dynamic lookup to access properties, V8 implements hidden classes, a concept originally present in another prototype-based programming language: SELF. Here is a quote from the abstract of the 1989 paper which first described this idea: (emphasis is mine)

[…] SELF implementation runs twice as fast as the fastest Smalltalk implementation, despite SELF's lack of classes and explicit variables.

To compensate for the absence of classes, our system uses implementation-level maps to transparently group objects cloned from the same prototype […]
C. Chambers, D. Ungar, and E. Lee. “An Efficient Implementation Of SELF, a Dynamically-Typed Object-Oriented Language Based on Prototypes.” SIGPLAN Not. 24, no. 10 (September 1989): 49–70. (opens new window)

Hidden class is a better, more explicit name for what this paper calls a map. It makes reference to SELF and avoids confusion with the Map data structures and their JavaScript implementation Map Objects (opens new window), although in fact they really are named maps in V8. This short vocabulary brief will prove handy when we'll start digging into V8. I'll mostly stick to the hidden classes terminology but don't be surprised if I drop a map here and there for variety.

Most of the modern JavaScript engines we use today implement similar approaches or hidden classes variants. Safari JavaScriptCore (opens new window) has structures. Microsoft Edge's ChakraCore has type (opens new window)s. Firefox' SpiderMonkey has shapes:

There are a number of data structures within SpiderMonkey dedicated to making object property accesses fast. The most important of these are Shapes. […] Shapes are linked into linear sequences called “shape lineages”, which describe object layouts. Some shape lineages are shared and live in “property trees”. Other shape lineages are unshared and belong to a single JS object; these are “in dictionary mode”.
Nicholas Nethercote (opens new window) (more (opens new window))

Everyone seems to use variants of these hidden classes but what do they look like, how are they generated? Let's take a look at what V8 does. Here is our Point function again:

function Point(x, y) {
  this.x = x;
  this.y = y;
}
const p1 = new Point(13, 37);
const p2 = new Point(2);
const p3 = new Point();
const p4 = new Point(4, 2);

1
2
3
4
5
6
7
8

From what we read about hidden classes (or maps) we could expect our engine to create a map for Point as soon as it assigns p1 and reuse this same map for p2, p3 and p4. Not quite. We are in fact looking at 3 related but different maps here. (Although this part is well illustrated in V8 design reference (opens new window), their documentation hasn't been updated for almost 4 years and it's worth paraphrasing.)

Let's start with the first part of our code. We define a function Point which we'll use as constructor for many points. It has two parameters, x and y, and its objects will remember the arguments we pass to this Point constructor.

function Point(x, y) {
  this.x = x;
  this.y = y;
}

1
2
3
4

At this point in time V8 puts our function in memory but we don't really care about this, our point is what happens when we create a point:

const p1 = new Point(13, 37);

When V8 sees our first usage of the Point function (or constructor) to create a new object based on this Point prototype it doesn't yet know what a Point really looks like, all it knows is function Point so it creates an initial version C0 of the hidden class p1 needs.

This first hidden class C0 represents a Point object without any property ({}). At the same time, V8 allocates our p1 variable as some memory object containing nothing but a class pointer to store the fact that p1's hidden class will be C0 for now.

Entering the Point function with our arguments 13 and 37, the next thing V8 encounters is this.x = x;, which resolves to this.x = 13;. Aha! The point p1 (which is this here) has a property called x and this property is not part of the map C0! First thing first, 13 is put in memory at this object's first memory offset (the spots where all the data contained in an object is stored) - we'll call it offset 0. V8 then creates a new hidden class C1 based on C0. What C1 brings to the table is a reference to a property named x, for which the data is stored at the object's offset 0.

V8 modifies C0 to tell it that each time an object with the hidden class C0 gets a property named x added to it, this object will have to transition to using C1 instead as its hidden class.

Next line is this.y = y;, ie. this.y = 37. Here is what happens internally: First 37 is stored at the next memory offset, offset 1. Then a new hidden class C2 is created by cloning C1 (I say cloning because C2 is the same as C1 at this point: it has a reference to a property named x for which the data is at offset 0). C2 receives the additional ability to have a property named y, for which the data can be found at its object's offset 1.

Now that we have a more capable hidden class than C1, C1's transition plan is updated to tell all C1 objects that they can transition to using C2 should they get a property named y set to them.

This might all seem a bit schematic and by now you are probably wondering if you could observe this behaviour all by yourself. Fortunately, Chrome developer tools allows us to do just that. Kudos to their team for exposing hidden classes to the end users. Someone at Mozilla told me they considered adding this capability to their own developer tools but nobody implemented it… yet.

If you want to play along, open a tab in Google Chrome or Chromium, fire up the devtools, open the devtools settings (focus the devtools and hit F1) and enable Show advanced heap snapshot properties in the Profiler section of the Preferences tab.

Close the settings, go to the console, copy the following code and evaluate it:

function Point(x, y) {
  this.x = x;
  this.y = y;
}

const p0 = new Point();       // C0
const p1 = new Point(13);     // C1 ← C0
const p2 = new Point(13, 37); // C2 ← C1 ← C0

1
2
3
4
5
6
7
8

Switch to the Profiles tab, select Take Heap Snapshot and hit the Take Snapshot button. Once this done, a couple MB of data have been collected and we can use the Class filter box to filter our objects. Just type Point, the name of our function. You should be able to see the three points we just created. I uncollapsed a few object properties, here's what I see:

A few basics to get us started:

Point @81129: an object with id 81129, its prototype is Point. This is p0 by the way. Next one is p1, then p2.
- __proto__ :: @79373 the id of this object's prototype is 79373. See how all our three points have this same prototype with the same id @79373?

Note that you can "preview" some objects by hovering their id (eg. @123):

Remember when I said that internally V8 hidden classes were named maps? Yep, that's what we're looking at, how exciting!

Point @81129
- map :: system / Map @79399 The hidden class for p0, our C0! Now uncollapse map:
  - back_pointer :: system / Map @79395
    - back_pointer :: system / Map @79375
      - transition :: system @79435
        2 :: system / Map @79457
Point @81133
- map :: system / Map @79453 The hidden class for p1, ie. C1.
Point @81137
- map :: system / Map @79457 The hidden class for p2, ie. C2.

1 Lars Bak spent the last 30 years implementing and optimizing virtual machines. He worked on the SELF, Strongtalk, HotSpot, V8 and now Dart VMs. The best parts of V8 come from his previous experience. From SELF (which is similar to JavaScript in that they're both prototype-based OO languages) came inline caching, inlining and deoptimization. What was learned from Strongtalk became a big part of HotSpot's success. Which then heavily influenced V8 (JIT), and then Dart came inspired by Smalltalk, JavaScript, C# and Erlang while its VM only kept the best parts of SELF, Strongtalk and HotSpot. Notice the name HotSpot? It comes from its ability to profile bytecode at runtime and target "hot spots" (frequently executed parts of code, eg. hot functions or hot code) for optimization, just like V8 does (opens new window). ↩

2 This was my initial plan. I got lost on my way. The draft of this blog post has been sitting on my hard drive since February. I might publish a follow-up someday. 😃 ↩

3 edited 2016/11/30: I incorrectly compared Java and Smalltalk. In Smalltalk, it is possible to add methods or properties to objects instantiated from a class, as pointed out by MoTTs_ (opens new window). Their comment introduces a very relevant distinction between compile-time classes and inheritance and runtime classes and inheritance. ↩

#v8 #crankshaft #javascript