Understanding npm

Understanding npm

npm is the default package manager for Node.js. It was initially created to help developers manage dependencies for their JavaScript applications. Born as an open source project in 2009, npm features a package registry and client that enables developers to consume and distribute open source code.

Since its creation, the role of npm has expanded to fulfill the broader needs of the JavaScript and Node.js developer community to include management of front-end web applications, mobile applications and other JavaScript development tools and frameworks. Today, the npm project and registry is hosted and managed as a free service by npm, Inc, supporting more than a billion downloads a month.

This visualization is intended to help you understand the role and scope of npm.

npm Registry

This is the npm registry. Every time you download a package using npm install, you're downloading it from here – it's essentially a huge database containing each package's files and associated metadata.

Packages

This is a package. Each package contains a little bit of JavaScript serving a particular purpose, and has useful metadata associated with it as well: its license type, a short description, GitHub repository, and so on.

Packages

npm has a lot of packages. Right now, there are 142,000 of them that have been published by the community.

Registry Growth

npm's growing significantly every day. Right now, there are about 5,925 new packages published per week. Of those, here are 32,768 of npm's most downloaded packages.

Downloads

Lots of packages means lots of downloads, too. In the last week alone, there's been a total of 353,296,858 package downloads from the registry.

Downloads

To give you an idea of what that's like, here's an illustration of each package downloaded at the current rate: each beam fired is roughly equivalent to 5 downloads.

Package Quality

npm's greatest strength, its size, is also its greatest weakness. There's too much code out there to conceivably digest on your own. However, there are a few quick checks you can make to narrow down the list of potential packages to use.

Package Quality

Let's separate the packages that are missing a readme. A readme is the central point of documentation for any package published to npm, and is more or less essential if you’d like to learn how it works.

Package Quality

And any packages that are missing a license field from their package.json. Unlicensed packages default to being copyrighted by the author, and as such are unusable in many commercial environments without permission.

Package Quality

It's important that packages link to a git repository that holds their original source. Without that, it becomes more difficult to review the code and submit improvements.

Package Quality

Finally, most packages need some kind of test script to survive changes submitted by the community. This won't always be the case, but can be a useful metric for a number of different kinds of packages.

Package Quality

After removing all of these, we're left with 100% of our original 32,768 packages that meet our criteria for "good quality modules", just a small fraction of our original sample.

Scatterplots

There's a lot more insight yet to be gained from the registry, though. Lets take a look at the data in a more traditional format: the scatterplot.

Downloads and Activity

If we look at each package's weekly download count based on their last publish date, we can see that recently updated packages appear to be much more popular overall. It's also worth noting that the vast majority of packages are dwarfed in their usage by a select few.

Stars and Activity

We can see a similar pattern by plotting out a project's GitHub stars instead of downloads, although there is a wider margin between the majority and the most popular packages.

Invisible Utilities

If we compare downloads and stars together, there's little correlation between usage (downloads) and visibility (stars), implying that useful packages can be "invisible" utilities.

For example: core-util-is is downloaded 1,652,405 times a week, but only has 39 stars on GitHub. Conversely, hover.css has 9,608 stars but only 30 downloads per week.

We can divide our data into categories for a deeper understanding of behaviour specific to certain authors and communities.

Let's break things up based on license choice and look at dependency counts over time. You'll notice that WTFPL projects tend to use fewer dependencies, and that ISC only became popular when it became npm's default license choice.

If we look at downloads on a logarithmic scale we can see that MIT packages are more popular overall, and GPL-type licenses are slightly less popular amongst users than other choices.

Communities tend to share a package name prefix to identify themselves, for example grunt-contrib-watch and gulp-autoprefixer are Grunt and Gulp plugins respectively. Here we have our scatterplots grouped into npm's 48 most used prefixes.

You'll notice that react and ember packages use a lot of supporting tooling by the amount of development dependencies they use.

Or you can observe the relative activity of each community by plotting their total number of published versions per package on a logarithmic scale.

We're only scratching the surface here: npm is full of data waiting to be explored. You can start delving deeper by tweaking the inputs below.