Monthly Archives: December 2010

Natus in Fedora

My Fedora RPMs for Natus v0.1.3 were approved and are in the process of making it through updates-testing. If you are running Fedora 13 or 14 and would like to try out Natus, you may do so using one of the following procedures:

  1. Wait about 24 hours for the packages to hit the mirrors, enable updates-testing and do: yum install natus natus-python python-natus. This will install the natus shell and its core modules, as well as the bi-directional python bindings.
  2. If you just can’t wait to try Natus, simply download and install the RPMs from the links above.

Currently there are no packages in rawhide due to this build root bug.  As soon as that bug is fixed, I’ll make sure to get a rawhide build done as well.  Also, please leave feedback in bodhi for your respective build.  Happy hacking!

Announcing Natus v0.1.3 – “Unto us a son is given”

I’m pleased to announce that I’ve released Natus v0.1.3, which can be downloaded here. 0.1.3 is largely an API cleanup and new module release, with a few new APIs to boot.  Here are the most visible changes:

POSIX Module

There is now a posix module which wraps a good number of POSIX APIs.  Here is an example on how to use it for a common task — reading a file:

$ natus
Natus v0.1.3 – Using: SpiderMonkey
>>> posix = require(‘posix’);
[object Object]
>>> fd = posix.open(“test”, posix.O_RDONLY);
3
>>> file = posix.read(fd, 4096);
Hello, this is my test file!
>>> posix.close(fd);

Pre/Post Require Hooks

The basic idea here is that an application that is embedding Natus may wish to either provide a module internally or would like to modify a module after its imported but before it is exposed into the JavaScript sandbox.  A current example of this usage can be found in the Natus shell.  The problem we have is that the ‘system’ module is supposed to provide the ‘args’ parameter which is an array of arguments.  However, this array is not supposed to be the actual argv that main() reports, but it is supposed to filter out any interpreter and the interpreter’s arguments.  This is highly context dependent.  What may be a sensible set of filters for the Natus shell may not be sensible for your application.  Nor do we want to expose a new API to store those values since our model highly values recursion and we’d prefer not to add a reference like this to every object.  The answer is that we setup a post-require hook for the system module which sets the proper args into the array.  It looks a bit like this:

Value set_path(Value& module, string& name, string& reldir, vector<string>& path, void* misc) {
const char** argv = (const char **) misc;
Value ret = module.newUndefined();
if (name != “system”) return ret;
Value args = module.get(“args”);
if (!args.isArray()) return ret;

for (int i=0 ; argv[i] ; i++)
args.push(module.newString(argv[i]));
return ret;
}

int main(int argc, char** argv) {
Engine engine;
engine.initialize();
Value global = engine.newGlobal();
// Filter out argv…
global.addRequireHook(true, set_path, argv);
}

JSON API

Also new in 0.1.3 is a JSON API.  This convenience API lets you parse JSON to a set of Value objects and convert a tree of Value objects into a JSON string.  Just call val.toJSON() on any value object to get a JSON string for that object and its children.  You may also call val.fromJSON(myJSONString) to convert the JSON string back into a tree of values.

C API

Lastly, we added a C API which wraps our current C++ API.  This will more easily allow C-based application writers to use libnatus.  For more details see the natusc.h header.

Conclusion

For more details, please see the ChangeLog file.  Happy hacking!

Using Python from JavaScript with Natus

In yesterday’s post, I gave an example of how to use JavaScript from within Python. This is an important feature enabling Python programmers to setup JavaScript environments and execute JavaScript in them.  But as I mentioned in my release announcement post, Natus’ Python bindings are bi-directional.  This means we can also access Python from JavaScript!  Note that not only are Python exceptions translated into JavaScript exceptions, but that we can handle them just as we would any other JavaScript exception:

$ natus
Natus v0.2 – Using: SpiderMonkey
>>> python = require(‘python’);
[object Object]
>>> python.import(‘os’);
<module ‘os’ from ‘/usr/lib64/python2.7/os.pyc’>
>>> python.os.listdir(‘/’);
[‘boot’, ‘dev’, ‘home’, ‘proc’, ‘sys’, ‘var’, ‘tmp’, ‘etc’, ‘root’, ‘selinux’, ‘lib64’, ‘usr’, ‘bin’, ‘lib’, ‘media’, ‘mnt’, ‘opt’, ‘sbin’, ‘srv’, ‘.autorelabel’, ‘.dbus’, ‘.smolt’, ‘cgroup’, ‘.autofsck’]
>>> python.import(‘urlgrabber’);
<module ‘urlgrabber’ from ‘/usr/lib/python2.7/site-packages/urlgrabber/__init__.pyc’>
>>> python.urlgrabber.urlread(“http://npmccallum.fedorapeople.org/”);
<html>
<head></head>
<body>
<h1>Hello!</h1>
</body>
</html>
>>> require(‘python’).import(‘urlgrabber’).urlread(‘http://testpage.com’);
Uncaught Exception: [Errno 14] HTTP Error 403 : http://testpage.com
>>> try {
…     require(‘python’).import(‘urlgrabber’).urlread(‘http://testpage.com’);
… } catch (e) {
…     x=1;
… }
1

Using JavaScript from Python with Natus

This is just a quick demo on how to use JavaScript from Python using Natus.  The basic idea here is that you create your engine and global and then you export python objects into the Natus global environment.  Once you do this, you are able to call back into Python from JavaScript.

$ python
Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import natus
>>> eng = natus.Engine()
>>> glb = eng.newGlobal()
>>> def myFunction():
…     print “myFunction”

>>> class Foo(object):
…     def printMe(self):
…         print “Foo”

>>> glb.func = myFunction
>>> glb.foo = Foo()
>>> glb.evaluate(“””
…   func();
…   foo.printMe();
… “””)
myFunction
Foo

Announcing Natus v0.1.2 — “For unto us a child is born…”

(If you’ve not read my last post, you’ll probably want to go do that now.)

I’m pleased to announce the first public release of Natus!  You can download 0.1.2 here.

About Natus

Natus is an MIT-licensed JavaScript meta-engine. Natus provides an engine agnostic way to build an application on javascript or to build modules for use in natus based applications. “Ugh, an abstraction layer…” I hear you say? You’re both right and wrong. Natus does provide a great way to build an application without having to care about the underlying engine. However, Natus does not attempt to hide the engine from you. Further, Natus attempts to expose the best of each engine, without taking a least-common-denominator approach.  Here are some of the important details:

  • MIT  licensed.
  • Full support for SpiderMonkey (1.8+), JavaScriptCore and V8 with common behavior across all three engines.
  • C and C++ APIs.
  • The C++ API is able to manage scope on the stack similar to V8, but without requiring HandleScope/Context::Scope.
  • Namespaced private pointers.
  • A shell which supports shebang, readline, history, and tab completion of the global hierarchy.
  • CommonJS module loader.
  • Full access to the underlying engine’s context and values.
  • Bi-directional Python bindings — Access Natus from Python (export objects, classes, etc); Access Python from Natus (load modules, run python code, etc)

Use Any Engine

Natus provides a method for applications to embed JavaScript without caring about the engine being used.  It means that applications that are written against one engine can trivially switch to using another engine should than engine evolve to fit the application’s needs better.  An example should probably prove my point:

int main() {
Engine engine;
engine.initialize(“V8”); // Or SpiderMonkey… Or JavaScriptCore… Or none at all and let natus pick!
Value global = engine.newGlobal();
global.evaluate(“alert(‘Hello World’);”);
}

Write Modules Once

Natus aims to provide an engine/framework agnostic API to write modules against.  This means that a MySQL driver written against Natus can be used in an application written using JavaScriptCore, SpiderMonkey or WebKit.

Pointer Security by Default

In Natus, all pointers are namespaced.  To set a pointer in a Value object, you would do this:

obj.setPrivate(“python.PyObject”, myPointer);

This ensures that one native module will never mistake a pointer as being (incorrectly) a certain type, preventing crashes due to object mishandling.

Looking to the Future

Although this is the first public release of Natus, we believe we have covered every possible use of a JavaScript API.  We are looking to stabilize the API soon-ish, so please review the API and let us know if you have any concerns!

Although we currently implement plain CommonJS module loading, we hope to very soon implement a cohesive module configuration management framework.  This attempts to solve the difficulty that module loading essentially brings multi-origin to a language that has traditionally been single-origin only.

Help Wanted!

We’d love help!  If you can:

  • Code
  • Write Documentation
  • Build a Website
  • Design a logo
  • Advocate
  • Do anything else you think might help…

Come talk to us!

On the Out-of-Browser JavaScript Ecosystem

So as of late JavaScript is really starting to become an interesting language outside of the browser (OOB).  There are dozens of projects with a variety of separate goals.  Although I can name probably 20-30 such projects off the top of my head, the short list is probably around five or so, including probably the most popular: NodeJS.  At the core of each of these projects is a module system.  Most implement CommonJSmodule system.  The problem is that the CommonJS module system is barely a specification at all.  It merely defines a few JavaScript userspace details.

Yes, some very real problems are beginning to show.  Its these that I’d like to discuss in detail.

Fragmentation

The biggest problem by far is fragmentation.  While NodeJS may be very cool, its locked into V8.  Similarly, Havoc’s HWF requires SpiderMonkey.  GNOME generally uses WebKit’s JavaScriptCore.  Similarly, GJS and Seed are built on SpiderMonkey and JavaScriptCore, respectively.  Each of the engines have certain pros and cons:

  • V8 – Its Fast.  However, it doesn’t work well on a variety of architectures.  Google also won’t commit to stability.  Its cool handle technique permits management of scope on the stack. Of course, this means no C API and “Can you say HandleScope?”
  • SpiderMonkey – Lots of features which the others don’t have, particularly “let,” “yield,” and E4X.  However, until recently it was slow (to my knowledge, trunk builds are now on par with V8).  Mozilla also won’t commit to regular unbundled releases. Works well on pretty much every architecture.
  • JavaScriptCore – Not as fast as V8 and recent SpiderMonkey. However, it is available lots of places.  And it works well on pretty much every arch.  It is also available by default on OSX and QT4.  It also has considerable love in WebKit-GTK+.

In short, there is no clear “right choice” that works for everyone.  Unfortunately this means that bindings to various utilities have to be written three times.  Let’s consider just databases for a moment.  If you just consider the big open source ones (MySQL, PostgreSQL, SQLite) this means that drivers for each database must be written 3 times.  Yet, and here is the kicker, the JavaScript engine choice that each project has made above is entirely irrelevant to writing a database driver.  We haven’t even touched other important SQL databases, like Oracle and DB2 nor have we talked about the recent crop of NoSQL databases, many of which (like MongoDB which has a native JavaScript API) are actually quite exiting when combined with OOB JavaScript.  Without even leaving the topic of databases, to support the most important databases we’d have to write bindings 3 * n, where n is the number of databases considered.  I can easily come up with 10 databases.  Let alone bindings that aren’t databases.  You can see that dividing the work like this doesn’t make much sense.

Module Loaders and Fragmentation

If the fragmentation issue above weren’t bad enough, now we have to look at the details of each open source project’s module system.  We know that we have to write each module three times.  Well, sorry to break it to you, but its much worse than that. Although most of the OOB JavaScript projects implement the CommonJS module loader, this specification says nothing about how it should actually be implemented when it comes to native code.  The end result of this is that the native modules written for NodeJS are entirely incompatible with v8cgi or ejscript.  So let’s imagine that there are (conservatively) 10 different native module systems out there.  Applying this to the database equation we had above, to cover the most popular 10 databases, we have to write a driver for each one 10 times: 10 * 10 == 100. Okay, it might not be exactly a full re-write each time.  But still, its non-trivial.  It is at this point that I need to again bring up the fact that the particulars of module loading and JavaScript engines is entirely orthogonal to the purpose of providing a database driver.

Module Loaders and Security

It is at this point we should look at how module loaders and JavaScript fit together.  JavaScript was designed as a sandboxed language.  This is in fact its greatest strength.  And while many people are excited just to break out of that sandbox, most fail to see that while this is a great feature, if not implemented correctly it undoes the greatest asset of JavaScript: the ability to run untrusted code.  Let’s return to the topic of connecting to a database.  A common technique in most languages is to copy your database username and password into your code in plaintext…  To start with, this is the wrong methodology.  But let’s toy with some of the implications.  The reason you copy your username and password into your code is that the database driver permits you to connect to any database, anywhere.  This is an appropriate design decision for a language that is designed for failure.  However, remember, JavaScript was designed to run untrusted code.  This is in fact the proper starting point for designing any language! Thus, OOB JavaScript should be designed for sandboxing even its modules.  Yet, to my knowledge, not a single OOB JavaScript framework has designed for that case.  What if there was an out-of-code way to say, for instance, that only a given database module may be loaded and it may only connect to one server using a predesignated username and password.  For instance:

mysql = require(“mysql”);
mysql.connect();
… // Do your stuff

In the above example, the MySQL module can only connect to one database.  Let’s look at the problems this alleviates:

  • If your Apache server is misconfigured to allow text viewing of your source code, you have not given out any important security information.
  • The lifespan of your passwords cannot be tracked by looking at your revision control (which also may be mis-configured for world-read).
  • If someone gains access to your JavaScript runtime, they can’t go scanning for other databases.
  • You could potentially create a sandbox database to run untrusted code in.
  • You have a clean separation of configuration from code, and your language encourages this.

Thus, while CommonJS’s module specification includes an “export” object so that symbols are not exported by default, this does not go far enough to ensure that the symbols that are exported are properly sandboxed.  The module framework should provide this.

Security and Private Pointers

Since JavaScript was designed as a sandboxed language and we literally have trillions of lines of untrusted code being run in browsers, any bridge to native code threatens the security of the JavaScript sandbox.  Private pointers are, in my mind, the greatest threat to this security model.  Yet, this is precisely how all of the JavaScript engines above are designed.  Consider this scenario: I’ve loaded into my sandbox a database driver and an XML parser, a pretty common scenario.  Both of these are implemented in native code, and each implementation has a context which is stored as a private pointer (a pointer embedded in the JavaScript object).  What happens when you pass the JavaScript object storing the database connection state to the XML parser.  Well, the XML native code retrieves the pointer and, believing it to be a pointer to a XML state object, executes it without type safety. CRASH! The typical answer to this problem thus-far has been type checking on the objects.*  The problem is that this puts the burden of security on the module implementers who, due to the sheer number and complexity of modules, are almost always guaranteed to get it wrong.

Conclusion

In short, the current OOB JavaScript ecosystem puts entirely too much burden on module developers, severely limiting the growth of the ecosystem.  Module developers shouldn’t have to:

  • Code for multiple engines (though the application writers may have reason to care).
  • Write modules for multiple module loaders.
  • Re-implement sandboxing techniques on their own.
  • Do type checking on every object to ensure that using a private pointer won’t crash the object.

As you may have guessed, I have some solutions to these problems.  But for that you’ll have to wait for my next post…

* – V8 provides at least some help in this regard by insisting that you install private pointers into slots.  However, this is an almost useless feature (as far as security is concerned) considering that almost every programmer puts their pointers into slot 0.