Chapter 4 Common patterns for licensing R

Disclaimer: This book is by no mean a legal book and should not be used as such. This book aims at helping you decipher the complexity of open source licenses, but if you have legal concerns and questions about open source licenses, please refer to a professional lawyer.

Before examining in more depth the link between your package and it environment, let’s remind that choosing a license is crucial in defining how your package / code can be used and reused. For example, MIT allows to do almost anything with your code (including copying and pasting your code in a paid, commercial, non-open software), while other licenses are less permissive.

4.1 R-package

In this first part, we’ll be talking about licensing R packages. Here are the several cases:

  • All packages need R / I’m making a package that only depends on {base}

  • I’m creating a package that has dependencies listed in the DESCRIPTION file, that might be installed at the same time as my package

  • I’m creating a package that has dependencies on internal files that are fully copied

  • I’m creating a package designed to access data

4.1.1 R & {base}

All packages need R / I’m making a package that only depends on base

  • Do you need to have a GPL-compatible license?

That’s a question your law department should answer, but here is the part from the GNU FAQ that might help you get an answer.

In the GPL FAQ, “Can I apply the GPL when writing a plug-in for a non free program?”, the FSF states that :

If the main program and the plugins are a single combined program then this means you must license the plug-in under the GPL or a GPL-compatible free software license and distribute it with source code in a GPL-compliant way. A main program that is separate from its plug-ins makes no requirements for the plug-ins.

Note also that CRAN allows packages that are licensed under a non GPL-compatible license: for example, Artistic License 1.0 or Common Public License Version 1.0, or the Lucent Public License

full_db %>%
  filter(str_detect(license, "Artistic-1")) %>%
  select(package, license)
full_db %>%
  filter(str_detect(license, "Common Public License Version 1.0")) %>%
  select(package, license)
full_db %>%
  filter(str_detect(license, "Lucent")) %>%
  select(package, license)

The CRAN policies states:

Packages with licenses not listed at https://svn.r-project.org/R/trunk/share/licenses/license.db will generally not be accepted.

Here is the list of these licenses:

download.file("https://svn.r-project.org/R/trunk/share/licenses/license.db", "licenses.dcf")
cran_l <- read.dcf("licenses.dcf") %>% 
  as_tibble() %>% 
  select(Name, Abbrev, Version)
cran_l
unlink("licenses.dcf")

Note that these 39 licenses are more than the 11 licenses officially listed on the r-project, and that some of these are not GPL-Compatible (for example Artistic License 1.0 or European Union Public License (EUPL) version 1.1)

Some other licenses are accepted, for example the “Unlimited” license:

full_db %>%
  filter(str_detect(license, "Unlimited")) %>%
  select(package, license)

We can also find this restriction concerning licensing of package in the CRAN policies:

The package’s license must give the right for CRAN to distribute the package in perpetuity.

4.1.2 Dependencies in DESCRIPTION

There are three ways you can list package dependencies in the DESCRIPTION:

  • Depends: the package is installed at the same time as your package, and it is loaded (attached to your search path) when you load your package. For example, the {ggraph} package has listed in Depends the {ggplot2} package, so when you do library(ggraph), it also attach {ggplot2} to the searchPath.
  • Imports: the package is installed at the same time as your package, and it is not loaded (attached to your search path) when you load your package. Yet you can use the functions from the package you’ve listed in your NAMESPACE. For example, the {ggraph} package has listed in Depends the {Rcpp} package. But when you do library(ggraph), {Rcpp} is not attached to your searchPath. Note that here, both {Rcpp} and {ggplot2} have to be installed when you install {ggraph}.
  • Suggests: the package is not installed at the same time as your package, and it is not loaded (attached to your search path) when you load your package. For example, the {ggraph} package has listed in Suggests the {network} package.

We will not classify each of this behavior, but be aware of these three behaviors. Whether or not they classify as one or two programs, or ether or not your package us a modified work of the dependencies is very much subject to interpretation and of the level of interactions between the packages.

That being said, we can assume that at least Depends and Imports do dynamic linking, as defined as

the executable The executable here being the package code still contains undefined symbols, plus a list of objects or libraries that will provide definitions for these. Loading the program will load these objects/libraries as well, and perform a final linking.

And remember this the statement from the GPL FAQ about dynamic linking:

Linking a GPL covered work", statically or dynamically with other modules is making a combined work based on the GPL covered work. Thus, the terms and conditions of the GNU General Public License cover the whole combination.

But package interactions might also enter in the “borderline case” from this very same FAQ:

If the main program dynamically links plug-ins, but the communication between them is limited to invoking the ‘main’ function of the plug-in with some options and waiting for it to return, that is a borderline case.

On a more pragmatic level, let’s think for a minute about using a dependency which has restrictions, like one with the ACM license, preventing from any commercial use without explicit consent from the author. If one of these packages is listed in Depends or Imports, it will be installed with your package, and elements of the NAMESPACE of this package will be used when your package is used. So this kind of dependency might be problematic if you choose to go for a more lax license. For example, MIT allows to make commercial use of your code. And it might be understood that allowing your code to be used anywhere would contradict the restriction imposed by one of the needed dependencies.

In other word, if your package {mine} needs a restrictive package {restrictivepkg} to work, that means that when the end user will install {mine}, R will check if {restrictivepkg} is installed, and install it if necessary. When the user will do library(mine) in a commercial context, they will also be using the {restrictivepkg} in a commercial context, even if not explicitly. So if {restrictivepkg} doesn’t allow commercial use while {mine} do, it will conflict.

4.1.3 Including someone else code

I’m creating a package that has dependencies on internal files that are fully copied

Including someone ease’s code can cover several things:

  • adding a C file which is compiled with the package
  • adding a piece of R code from another package
  • adding code in another language

It’s a little bit more clear that you are doing static linking there:

Static linking is the result of the linker copying all library routines used in the program into the executable image.

As you are copying and pasting code from one work into yours, it’s more clear that you are doing static linking. Here, you should respect the license under which the code you’re using is licensed. For example, MIT allows to freely reused the code

Where code is copied (or derived) from the work of others (including from R itself), care must be taken that any copyright/license statements are preserved and authorship is not misrepresented.

Be careful: the constant in almost every open source licenses is that you should add the disclaimer of non liability in the code you’re using. So if ever you are copying another library, don’t forget to add in your license file what library you are using, and what the license of this specific piece of code is.

4.1.3.1 Bundling JavaScript files

In these cases, the files are used at run time, they are not compiled when the package is installed. Here are some examples of this behavior:

This package, designed to do interactive cartography, bundles a series of JavaScript files inside it, not written by the author of the R package. The license of the package is GPL-3, and it is combined with a LICENSE.note file, which starts as such:

The leaflet package as a whole is distributed under GPL-3 (GNU GENERAL PUBLIC LICENSE version 3).

The leaflet package includes other open source software components. The following is a list of these components (full copies of the license agreements used by these components are included below):

And then you’ll find a list of all the licenses for the various elements to be found in the package.

Also, the list of all authors from the various external elements is to be found in the DESCRIPTION

cat(packageDescription("leaflet")[["Author"]])
## Joe Cheng [aut, cre],
##   Bhaskar Karambelkar [aut],
##   Yihui Xie [aut],
##   Hadley Wickham [ctb],
##   Kenton Russell [ctb],
##   Kent Johnson [ctb],
##   Barret Schloerke [ctb],
##   jQuery Foundation and contributors [ctb, cph] (jQuery library),
##   Vladimir Agafonkin [ctb, cph] (Leaflet library),
##   CloudMade [cph] (Leaflet library),
##   Leaflet contributors [ctb] (Leaflet library),
##   Leaflet Providers contributors [ctb, cph] (Leaflet Providers plugin),
##   Brandon Copeland [ctb, cph] (leaflet-measure plugin),
##   Joerg Dietrich [ctb, cph] (Leaflet.Terminator plugin),
##   Benjamin Becquet [ctb, cph] (Leaflet.MagnifyingGlass plugin),
##   Norkart AS [ctb, cph] (Leaflet.MiniMap plugin),
##   L. Voogdt [ctb, cph] (Leaflet.awesome-markers plugin),
##   Daniel Montague [ctb, cph] (Leaflet.EasyButton plugin),
##   Kartena AB [ctb, cph] (Proj4Leaflet plugin),
##   Robert Kajic [ctb, cph] (leaflet-locationfilter plugin),
##   Mapbox [ctb, cph] (leaflet-omnivore plugin),
##   Michael Bostock [ctb, cph] (topojson),
##   RStudio [cph]

4.1.3.2 Bundling C / C++ / FORTRAN files

This format of file is a little bit different as it compiles the files when the package is installed. In other word, they are not “external calls”, they become a core of the resulting R package.

This is for example what we’ve seen with the {rngwell19937} package in the previous chapter:

readLines(
  system.file("LICENSE", package = "rngwell19937")
) %>% glue::as_glue()
## This code can be used freely for personal, academic, or non-commercial purposes.
## 
## For commercial purposes, see the conditions formulated in file rngwell19937/src/WELL19937a.c.

This .c file starting with:

download.file("https://cran.r-project.org/src/contrib/gpclib_1.5-5.tar.gz", "gpclib.tar.gz") 
untar("gpclib.tar.gz")
readLines(
  "gpclib/src/gpc.c"
)[1:30] %>% glue::as_glue()
## /*
## ===========================================================================
## 
## Project:   Generic Polygon Clipper
## 
##            A new algorithm for calculating the difference, intersection,
##            exclusive-or or union of arbitrary polygon sets.
## 
## File:      gpc.c
## Author:    Alan Murta (email: gpc@cs.man.ac.uk)
## Version:   2.32
## Date:      17th December 2004
## 
## Copyright: (C) 1997-2004, Advanced Interfaces Group,
##            University of Manchester.
## 
##            This software is free for non-commercial use. It may be copied,
##            modified, and redistributed provided that this copyright notice
##            is preserved on all copies. The intellectual property rights of
##            the algorithms used reside with the University of Manchester
##            Advanced Interfaces Group.
## 
##            You may not use this software, in whole or in part, in support
##            of any commercial product without the express consent of the
##            author.
## 
##            There is no warranty or other guarantee of fitness of this
##            software for any purpose. It is provided solely "as is".
## 
## ===========================================================================
unlink("gpclib.tar.gz")
unlink("gpclib", recursive = TRUE)

4.1.4 Data package

How to license a data package? In this GitHub issue, we can find three common patterns for building a data package: data bundled inside the package, data accessed through an API, data access for files that are online but not through an API.

4.1.4.1 Data bundled inside the package

This kind of behavior is close to the one from package bundling code written by someone else: if you’re packing data inside a package, then the global package should be compatible with the license of the dataset. And keep in mind that including a dataset inside a package that has a non compatible license can create breaking changes on the long run, when you have to either remove the dataset from the package, or switch the package license. This is for example what happened with the {tidytext} package: when first writing the package, Julia Silge and David Robinson included several datasets, and released the package as MIT. Later on, it turned out that some datasets included were not MIT-compatible.

We used the knowledge we had at the time and followed the practices we saw other OSS packages for text analysis following. However, it turns out that this was not the right approach from a licensing standpoint, because the lexicon licenses were not the same as the package’s license

So what to do next? As Julia describes in her blog post, she reached out to every dataset author to get there authorization (or not) to use the dataset in an MIT licensed package :

So, where does that leave us?

  • The sentiment lexicon of Bing Liu and collaborators is still a dataset within tidytext, as he has explicitly given permission for us to include it there. If you need a sentiment lexicon to use in another R package, this is the easiest one to use.
  • The AFINN lexicon is available through textdata, under an OSS license different from tidytext’s license.
  • The sentiment lexicon of Loughran and McDonald for financial documents is available through textdata for research purposes, but you must obtain a license from the creators for commercial use.
  • The NRC lexicon is currently not available through any package I am involved in.

So to sum up, two datasets have switched to the {textdata} package (see next section), one is still in {tidytext} as the author has explicitly given the authorization to do so, and one is not available.

Another example, the {igraphdata}, which is released under XX and has a file LICENSE with each licenses for each dataset listed:

readLines(
  system.file("LICENSE", package = "igraphdata")
) %>% glue::as_glue()
## This files lists the copyright holders and licenses (if known) of the
## various data sets in the igraphdata package:
## 
## Koenigsberg
## -----------
## 
## This data set was collected from Public Domain sources, and it is 
## int the Public Domain itself.
## 
## Please cite the following publication if you use this data set:
## 
## Euler, L.: Solutio problematis ad geometriam situs pertinentis.
## Commentarii academiae scientiarum Petropolitanae, 128-140, 1741.
## 
## 
## UKfaculty
## ---------
## 
## This dataset is licensed under a Creative Commons
## Attribution-Share Alike 2.0 UK: England & Wales License, 
## see http://creativecommons.org/licenses/by-sa/2.0/uk/ for details.
## 
## Please cite the following reference if you use this dataset:
## 
## Nepusz T., Petróczi A., Négyessy L., Bazsó F.: Fuzzy communities and
## the concept of bridgeness in complex networks. Physical Review E
## 77:016107, 2008.
## 
## USairports
## ----------
## 
## The data is from the Research and Innovative Technology Administration, 
## and it is in the Public Domain.
## 
## enron
## -----
## 
## The data was downloaded from http://www.cis.jhu.edu/~parky/Enron/
## Please cite the appropriate paper (see the manual) if you use
## this data set.
## 
## foodwebs
## --------
## 
## The data was downloaded from 
## http://vlado.fmf.uni-lj.si/pub/networks/data/bio/foodweb/foodweb.htm
## Please see the manual page of the data set for the publications 
## that you should cite, if you use some of this data.
## 
## immuno
## ------
## 
## The data was collected by David Gfeller. Please cite his PhD thesis if 
## you use this data set:
## 
## D. Gfeller, Simplifying complex networks: from a clustering to a
## coarse graining strategy, \emph{PhD Thesis EPFL}, no 3888, 2007.
## http://library.epfl.ch/theses/?nr=3888
## 
## karate
## ------
## 
## The data was extracted from the original publication.
## Please cite:
## 
## Wayne W. Zachary. An Information Flow Model for Conflict and
## Fission in Small Groups. Journal of Anthropological Research
## Vol. 33, No. 4 452-473
## 
## kite
## ----
## 
## The data set was extracted from the publication. Please cite:
## 
## David Krackhardt: Assessing the Political Landscape: Structure, Cognition,
## and Power in Organizations. Admin. Sci. Quart. 35, 342-369, 1990.
## 
## macaque
## -------
## 
## This dataset is licensed under a Creative Commons
## Attribution-Share Alike 2.0 UK: England & Wales License, 
## see http://creativecommons.org/licenses/by-sa/2.0/uk/ for details.
## 
## Please cite the following reference if you use this dataset:
## 
## Négyessy L., Nepusz T., Kocsis L., Bazsó F.: Prediction of the main
## cortical areas and connections involved in the tactile function of the
## visual cortex by network analysis. European Journal of Neuroscience
## 23(7):1919–1930, 2006. 
## 
## rfid
## ----
## 
## This data set is licensed under a GPL v3 license.
## 
## Please cite the following reference if you use this dataset:
## 
## P. Vanhems, A. Barrat, C. Cattuto, J.-F. Pinton, N. Khanafer,
## C. Regis, B.-a. Kim, B. Comte, N. Voirin: Estimating potential
## infection transmission routes in hospital wards using wearable
## proximity sensors. PloS One 8(9), e73970 306 (2013).
## 
## yeast
## -----
## 
## The data was downloaded from 
## http://www.nature.com/nature/journal/v417/n6887/suppinfo/nature750.html
## 
## Please cite the following paper if you use this data set:
## 
## Comparative assessment of large-scale data sets of protein-protein
## interactions. Christian von Mering, Roland Krause, Berend Snel,
## Michael Cornell, Stephen G. Oliver, Stanley Fields and Peer Bork.
## Nature 417, 399-403 (2002)

4.1.4.2 Data access for files that are online

A way to be able to serve data in a package is to choose a strategy used in the {textdata} package. The package by itself doesn’t include any dataset, yet whenever you want to use one of this dataset, you are prompted to accept the license of each dataset.

4.1.4.3 Data accessed through an API

It’s reasonable to think that accessing data through an API enters covers the same pattern as {textdata} does: the package does not carry the data, but it allows to connect to it and to transfer it. With this kind of package, you’re creating a tool that is used to convey data from external source to the user’s computer. So even if the tool by itself can be released under a different license from the one of the datasets, it will be easier to release the package under the same license as the data, so that the user doesn’t have to juggle between the two licenses when using your package.

4.2 About contribution

If you are opening your package to contribution (for example by sharing the source code on GitHub), you need to keep in ming that if one day you need to change the license, you’ll have to have the permission from all the contributors to the project. Indeed, every person adding code to the code-base has the copyright of this piece of code, and when they contribute they accept to release this piece of code under the same license as the one from the package. But this can become problematic when you have a large code-base and/or numerous contributors. A good example is the one from Bootstrap which we described in the second chapter: when the team wanted to switch to MIT, the had to collect an agreement from everybody who has ever contributed code to the framework, which was something like more than 500 peoples. That’s a complicated task.

And even more, as some piece of code were not MIT compatible or as the contributors didn’t want to change their contribution to MIT, some part of the code-base had to be rewritten. In the R-world, the same situation was encountered when changing the {covr} license (on a smaller scale of course).

One of the way to prevent that is to make contributors sign a contributor agreement, as {shiny} does: when you contribute to the package, you assign the copyright of your contribution to the package author, preventing any future claim and leaving the package author the freedom to use the code you’ve contributed.

4.3 Licensing what you’re writing

4.3.1 Online books & blogs

Don’t forget that everything you are writing is copyrighted under your name.

[C]opyright is automatically attached to every novel expression of an idea, whether through text, sounds, or imagery. (…) the words in this paragraph are protected by copyright as soon as they are written.

That means that everything you are publishing online is copyrighted: blog post, online book, GitHub pages… So as any other code-related work, your content should be licensed in a way that allows others to reuse (or not reuse) your work.

For example, the r4ds book is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License. This license allows to share and distribute copies of the work, but you mush give attribution, and you’re not allowed to use it for commercial purpose nor to create any derivative work.

Some other books are listed under CC BY-NC-ND 3.0 US: Text Mining with R, rOpenSci Packages: Development, Maintenance, and Peer Review, and Open Forensic Science in R.

The Advanced R book, on the other hand, is bi-licensed:

This work, as a whole, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The code contained in this book is simultaneously available under the MIT license; this means that you are free to use it in your own packages, as long as you cite the source.

Some other books are listed under CC BY-NC-SA 4.0: R Markdown: The Definitive Guide, Geocomputation with R, blogdown: Creating Websites with R Markdown, Data Science at the Command Line, Data Science Live Book and Hands-On Programming with R. This license allows to share and adapt the original work, provided you give attribution, that you don’t use the content for commercial purposes and that you share the content under the same license. This is also the license under which this book is released.

The same goes for blog posts: do add a license under your posts if you want visitors to be able to reuse the tips and tricks you’re sharing online. Here are, for example, various licenses for blogs:

  • rud.is blog is released under CC-BY-SA-4.0
  • colinfay.me is released under CC-BY-SA-4.0 with the code sections under MIT
  • masalmon.eu is All rights reserved

4.3.2 Publishing an article

When submitting an article to a journal, note that the code contained inside the article might be subject to specific license. For example, the Journal of Statistical software requires this:

“Code needs to include the GNU General Public license (GPL), versions GPL-2 or GPL-3, or a GPL-compatible license for publication in JSS.”

Submitting to the Journal of Open Source Software also have requirements regarding the published content:

Make your software available in an open repository (GitHub, Bitbucket, etc.) and include an OSI approved open source license.

And in the R Journal, the paper will be published in CC-BY

I acknowledge that if accepted my article will be published with CC-BY license

4.3.3 Various online content

You might not know this but if you’re publishing answers to Stackoverflow, you’re releasing code in CC BY-SA 3.0:

As noted in the Stack Exchange Terms of Service and in the footer of every page, all user contributions are licensed under Creative Commons Attribution-Share Alike. Proper attribution is required if you republish any Stack Exchange content.

In other word, you are free to reuse, share and adapt code from StackOverflow in your software (and let’s admit that we all do…), provided you give attribution… and that you distribute the result under the same license as the original.

It’s also interesting to note that StackOverflow’s Terms of Service come with a Disclaimer of Warranties. See https://stackoverflow.com/legal/terms-of-service/public, section 7.

One other place you can share code is GitHub gists platform. As defined by GitHub, gists are common repo, so they should be licensed as such. Though, it’s rather unpractical to add a license file every time you write a gist. Some users have chosen to add a license file that covers all the gists from their account:

“This license applies to all public gists @martinbuberl”

https://gist.github.com/martinbuberl/c0de29e623a1e34d1cda7e817d18bafe