Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Exporting a larger module from c++ seems to have a performance overhead when calling functions from js? #1103

Open
twuky opened this issue Nov 10, 2021 · 14 comments

Comments

@twuky
Copy link

@twuky twuky commented Nov 10, 2021

This is just a question really - I'm not running into any actual errors at the moment.
I've been experimenting with using this api to create bindings for this graphics library: https://github.com/raysan5/raylib
I started with the yeoman template and wrote a binding that exported 13 functions to be able to test loading textures and drawing them to the screen. The binding has functions that simply wrap library functions and then I add them to an Napi::Object Init, like in the template and export it.

Once I got the hang of using this api I started working on a script that could generate a .cc file with the bindings for the rest of the functions, which at this point seems to compile and run though I haven't tested them all. This version has 475 exported functions.

I wrote a quick 'benchmark' to test how many textures it could draw per frame above a certain framerate. My initial test bindings seem to run this benchmark ~20% faster than the module with 475 functions.

I understand there is some overhead in transferring data from JS to the CPP addon - but do you have any information or reasoning why adding more functions to the addon would slow things down like this? Is there some other way to structure the module that better handles a larger amount of functions - or prioritize ones that need to be called more frequently?

@twuky twuky changed the title Exporting a larger module from c++ seems to have a performance overhead when calling functions from js? Question: Exporting a larger module from c++ seems to have a performance overhead when calling functions from js? Nov 10, 2021
@mhdawson
Copy link
Member

@mhdawson mhdawson commented Nov 12, 2021

There is nothing in Node-API itself that makes us believe it should be slower with more functions on a single object.

In terms of general JavaScript, we could see that if you have more properties on an Object, the lookup of the function on the object could get longer based on the number of properties and possibly it might be harder for V8 to optimize when those properties are a native function.

A good experiment might be to try saving the properties from the exports object into local variables once at the start and then calling the saved values.

@gabrielschulhof will add an example following the meeting.

@gabrielschulhof
Copy link
Contributor

@gabrielschulhof gabrielschulhof commented Nov 12, 2021

So, instead of

const addon = require("my_addon");
my_addon.someFunction(/*...*/);
my_addon.someOtherFunction(/*...*/);
my_addon.yetAnotherFunction(/*...*/);
/*...*/

You can write

const { someFunction, someOtherFunction, yetAnotherFunction, /*...*/ } = require("my_addon");
someFunction(/*...*/);
someOtherFunction(/*...*/);
yetAnotherFunction(/*...*/);
/*...*/

Admittedly it's much more verbose, but if the performance improves it may be worth the trouble. Also, the const { someFunction, /* ... */ } does not have to be exhaustive. The list can contain only those functions you're actually using from the add-on. So, if you use different subsets of functions of the add-on in different files, the list of items in const {/*...*/} need only contain the names of the functions you're actually using in the given file.

@twuky
Copy link
Author

@twuky twuky commented Nov 12, 2021

I appreciate you being able to read this and respond here. At your suggestions I tried using destructuring to import just the needed functions, but I can't observe any difference in performance from changing this. I had thought that it might have been a linear lookup issue, and if so wouldn't changing the order of exports also improve performance in that sense? Does setting the exports from c++ guarantee order? Before posting here I had tried shifting the order of the export so the most frequently called function was the first (then later tried last) export and that had no noticable improvement.
Out of interest I also tried changing the binding.js itself to only export the functions needed from the .node and didn't observe any improvements either.
I can confirm that the function wrappers (of the functions actually being used in the test case) in c++ are written the same between the two versions of the binding, so this still seems fairly puzzling to me. The only difference I can tell between the files are that one is simply much much longer as it wraps and exports more functions...

@mhdawson
Copy link
Member

@mhdawson mhdawson commented Nov 26, 2021

We discussed again in the team meeting today, we are puzzled but nobody has had a chance to take a look yet.

I'm not sure changing the order of the exports would make a different as I expect there is something like a hashtable being used so the order exports are added may not affect the lookup directly.

Do you have simple/small recreate that shows the issue?

@twuky
Copy link
Author

@twuky twuky commented Nov 30, 2021

I ended up finding a few places where I was using .ToNumber() on my Napi::Values where I should have been casting with .AsNapi::Number(), which ended up closing the gap around half (now ~11% difference instead of 20%). So ultimately this is leading me to think there are some other differences in how I implemented it that are at fault, though I have been hard-pressed so far to find anything else that would affect it, particularly in the draw call function that's being called the most. So I am inclined to agree its not the size of the export that would be causing any performance issues.

I currently only have the repo I've been writing myself. I can include some instructions on switching between the two versions if anyone would be generous enough to take a look. I made a quick file to compare the handwritten vs code generated functions that are used in my test case - if it would be helpful to look at the direct differences. https://github.com/twuky/raylib-4.0/blob/master/src/comparison.cc

@deepakrkris
Copy link

@deepakrkris deepakrkris commented Dec 3, 2021

I am triaging this issue now

@deepakrkris
Copy link

@deepakrkris deepakrkris commented Dec 31, 2021

@twuky I used a script to generate a very large binding file,

const path = require('path');
const fs = require('fs');

function generateFileContent(maxFns) {
  const content = [];
  const inits = [];
  const exports = [];

  for (let index = 0; index < maxFns; index++) {
    inits.push(`Value TestInt${index}(const CallbackInfo& info) { return Napi::Number::New(info.Env(), ${index}); }`);
    exports.push(`exports.Set("TestInt${index}", Function::New(env, TestInt${index}));`);
  }

  // content.push('#if (NAPI_VERSION > 5)');
  content.push("#define NAPI_EXPERIMENTAL");
  content.push('#include "napi.h"');
  content.push('using namespace Napi;');
  content.push('namespace {');

  inits.forEach(init => content.push(init));
  content.push('}');

  content.push('Object Init(Env env, Object exports) {');

  exports.forEach(exp => content.push(exp));

  content.push('return exports;');
  content.push('}');
  content.push('NODE_API_MODULE(addon, Init);');
  // content.push('#endif');

  return Promise.resolve(content.join('\r\n'));
};

function writeToBindingFile (content) {
  const generatedFilePath = path.join(__dirname, 'binding.cc');
  fs.writeFileSync(generatedFilePath, '');
  fs.writeFileSync(generatedFilePath, content, { flag: 'a' });
  console.log('generated binding file ', generatedFilePath, new Date());
};

generateFileContent(2).then(writeToBindingFile);

@deepakrkris
Copy link

@deepakrkris deepakrkris commented Dec 31, 2021

@twuky after running the above script to create a binding object with 10k functions, there was no noticeable performance overhead when one of the functions is required and called as below

const addon = require('./build/Release/addon.node');
console.log(addon.TestInt9999());

@deepakrkris
Copy link

@deepakrkris deepakrkris commented Dec 31, 2021

the generated large binding object looks like

#define NAPI_EXPERIMENTAL
#include "napi.h"
using namespace Napi;
namespace {
Value TestInt0(const CallbackInfo& info) { return Napi::Number::New(info.Env(), 0); }
Value TestInt1(const CallbackInfo& info) { return Napi::Number::New(info.Env(), 1); }
.
.
.
.
Value TestInt9998(const CallbackInfo& info) { return Napi::Number::New(info.Env(), 9998); }
Value TestInt9999(const CallbackInfo& info) { return Napi::Number::New(info.Env(), 9999); }
}
Object Init(Env env, Object exports) {
exports.Set("TestInt0", Function::New(env, TestInt0));
exports.Set("TestInt1", Function::New(env, TestInt1));
exports.Set("TestInt2", Function::New(env, TestInt2));
.
.
.
.
exports.Set("TestInt9998", Function::New(env, TestInt9998));
exports.Set("TestInt9999", Function::New(env, TestInt9999));
return exports;
}
NODE_API_MODULE(addon, Init);

@deepakrkris
Copy link

@deepakrkris deepakrkris commented Dec 31, 2021

@twuky IMO, type cast in the cpp layer especially something like uint arrays or buffer to string adds to performance , is what I have noticed

@deepakrkris
Copy link

@deepakrkris deepakrkris commented Dec 31, 2021

there was one case in my experience with a node-java bridge addon , where passing a uint array from javascript and typecasting it into a string in the bridged java code caused 10x load. We have to look case by case to specifically point at a performance overhead

@deepakrkris
Copy link

@deepakrkris deepakrkris commented Jan 3, 2022

I did a performance test and find that the first call to addon function does takes more time incase of a larger module, but after that regardless of small or larger exported module the results were always the same

const { performance, PerformanceObserver } = require('perf_hooks');
const addon = require('./build/Release/addon.node');

const perfObserver = new PerformanceObserver((items) => {
    items.getEntries().forEach((entry) => {
      console.log(entry)
    })
})

perfObserver.observe({ entryTypes: ["measure"], buffer: true })

performance.mark("loop-start")
for (let i=0 ; i<10; i++) {
    performance.mark("call-start-"+i)
    console.log(addon.TestInt1());
    performance.mark("call-end-"+i)
}
performance.mark("loop-end")

performance.measure("total call", "loop-start", "loop-end")

for(let i=0; i<10; i++)
 performance.measure("individual calls" + i, "call-start-" + i, "call-end-" + i)

@deepakrkris
Copy link

@deepakrkris deepakrkris commented Jan 3, 2022

Performance entry for a first call to a function in a smaller module

PerformanceEntry {
 name: 'individual calls0',
 entryType: 'measure',
 startTime: 70.427466,
 duration: 2.197071 }

Performance entry for a first call to a function in larger module

PerformanceEntry {
  name: 'total call',
  entryType: 'measure',
  startTime: 58.891178,
  duration: 6.405693
}

I have repeated the test locally (OSX, node.js 14.0 ) many times , result is always consistent

@KevinEady
Copy link
Contributor

@KevinEady KevinEady commented Feb 11, 2022

Hi @twuky ,

As you can see from @deepakrkris 's analysis above, there was no noticeable difference in executing functions in small and large bindings, except for a small difference when calling the function the first time -- this is expected. Do you have any concerns from your code? If you believe there are further issues, is it possible for you to provide us with a fully reproducible repo so we can investigate your code directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants