Pages

Tuesday, 29 December 2020

From cartoon to interactive infographic –the sane way

Making cartoon representations (technically vector graphics) in Adobe Illustrator is very fun, whereas the very idea of making a cartoon representation via line plots with Excel, Matlab, R, Plotly etc. would make anyone insane even just thinking about it. Luckily Illustrator images can be coloured based on numerical data in an automated way... without being manually plotted in Excel. Here I discuss exporting the vector graphic and modifying it with D3.js in a Jupyter notebook.

Choropleth map

A graph is actually a poor reference, a map is much better. A map with areas coloured based on a numerically defined colourscale is called a choropleth map. This is different that a heatmap overlayed on a map as in the latter the map is a layer that is not actually altered. To make a choropleth map one needs to define the regions, using GeoJSON format, which is totally a discussion for another time. However, it is sufficient to point out colouring a cartoon based on a dataset is a choropleth-map like representation, but generally the word "infographic" will suffice. 
There are many established way to plot data on maps, such as cartograms, which distort the shape of map in a variety of possibly ways, such as mosaic cartograms (pixellated regions), hexbin cartograms (hexagon tessallated pixellated regions), anamorphic cartograms (warped regions with contiguity maintained), isomorphic cartograms (scaled regions with contiguity broken) etc. Here I'll just be discussing a colour equivalent although cartogram equivalents are all possible, but may required the GeoJSON approach.

Caveats

  • If it is a figure with a half dozen areas to be coloured only once on a given dataset, it is obviously worth doing it manually.
  • I should apologise that, whereas I use British English in normal speech, when I code I use American English. This is not mandated by a Python PIP, but it does keep everything slightly more consistent, even if it a tad weird. So please no hate mail.

Illustrator

Illustrator, or Inkscape, is a vector graphic editing program. Namely, whereas a raster image is a grid of pixel with a colour value, in a vector image there are paths. The open format for a vector image is SVG, which will be used here. A raster image can be saved as a BMP (grid is stored as is), PNG (grid is compress without loss of signal) or JPG (optimised with loss of signal). In science, a vector drawing is referred to as a "cartoon", say of a plant or a protein complex.
For an example I will make three protein. I will made them wrinkly protein:
  1. Menu bar: Effects > Distort > Roughen... Smooth radiobox
  2. Menu bar: Object > Expand Appearance (optional normally, but required for things like Clipping Mask)
And I added duotone shading as seen in Flaticon
  1. Draw a rectangular triangle or rectangle (foreground) in front of a rectangle (background) bigger than the protein
  2. Group the two (Right click: Group)
  3. Right click: Arrange > Send to back
  4. Select the two
  5. Menu bar: Object > Clipping Mask > Make ...
  6. Double click to enter object. Select border (formerly the front object)
  7. Toolbar: Default fill and stroke, blank the fill
  8. Exit object (button on ribbon under filename)
And I added a text label. Majorly I grouped the text label with the relevant objects. This will allow me to grab hold of the objects. If no label is present, ids can be assigned in the Layers pane. If this is absent go via menu bar: Window > Layers. There the id of the object can be set —letters and underscore only, no spaces.

Once complete export as a SVG with inline styles, SVG font, layer names as ID.
There are seven shape elements in a SVG: line, path, polyline, circle, ellipse, rect and polygon. The text and g elements seen below are not technically shape elements, but feature heavily. Additionally, there are filter, symbol, mask etc. but these are less common and should not be a concern here.

This applies to complex objects too. For example, I could download the SVG of the university crest of Wikipedia and add reasonable IDs to certain parts and have some zany infographic that is hard to follow —or at least for me, who has never got the point of Andrew's glyph plots with faces say. 

Jupyter

Open

Now open the SVG in a Jupyter notebook:

from IPython.display import display, SVG, Javascript, HTML

display(SVG('cartoon-01.svg'))
The function display is actually called behind the scenes on the last output of the cell (_), calling explicitly allows one to show multiple pandas table etc. in the same cell output anywhere in the code. The thing displayed can be a SVG object as seen here, a Javascript snippet or HTML code.

Cell magic

A great strength of a Jupyter notebook is that you can do more than just run Python or Julia snippets.  You can run many "cell magic" commands! This snippet will run a javascript command that adds D3.js to the window namespace. Sure, require.js is there to not pollute the namespace, but this is not some background operation, this is for a human.
%%javascript
require.config({ 
     paths: { 
     d3: 'https://d3js.org/d3.v5.min'
}});

require(['d3'], function(d3) {
    window.d3 = d3;
});
Loading via a script element will not work, but works for places like this blog post:
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.3.1/d3.min.js" integrity="sha512-9H86d5lhAwgf2/u29K4N5G6pZThNOojI8kMT4nT4NHvVR02cM85M06KJRQXkI0XgQWBpzQyIyr8LVomyu1AQdw==" crossorigin="anonymous"></script>
The library D3.js allows data-driven image manipulations and is very powerful, albeit counter-intuitive by being similar to JQuery but subtly different. Plotly uses it behind the scenes for example. However, like everything JS-related folk online will zealously announce it is dead, whereas in reality it is "mature" (i.e. documented and not buggy, a concept alien to JS) and will stay around for a while. The issue is that the selection logic, like for JQuery, can be achieved via ES6 JS and many transformations can be complexly achieved via CSS3. Even if for this simple example vanilla JS will suffice, please ignore these rabble-rousers.

From Python code


If we want to pass Python data to JS we have to use display(Javascript(cmd_str)) in a Pythonic cell, where the cmd_str may contain some data. This data often cannot be simply the output of the __str__() magic method because of many reasons, such as True, False and None are true, false and null in JS and instead it is safer to JSON stringify the objects even when its a simple list/array, dictionary/object.

Theory: D3.js

Most life science academics work in Python and not so much in JS, so it is best to open a parenthesis about D3, nodes and the fill style.

Selection

The added SVG will have the name "Layer_1" unless the layer name was changed in Illustrator. There are ways to grab the content of the preceding Jupyter cell, but that is convoluted and total overkill. So in a JS or in the console, this selects the SVG layer:
d3.select('#Layer_1')
Whereas we want to iterate over the groups, the g elements:
d3.select('#Layer_1').selectAll("g").each(function(d, i) {...});
In the function(d, i, nodes) { ... } the this is the DOM node in the iteration, similarly to JQuery. This is the same as nodes[i], so can be written in an arrow function. So if it contains a text element, it would be picked up with:
const text = d3.select(this).select('text').text();
The method select(selectorString) selects the first selectorString among the descendants, while selectAll(selectorString) selects all (akin to find(selectorString), not children(selectorString) in JQuery, or to querySelectorAll(selectorString) in vanilla JS). Like for JQuery, the output of select/selectAll is not a DOM element, but a D3 selector of that element. To convert a DOM node (e.g. element) to a D3 selector and viceverse one can do this:
let selector = d3.select(element);
let elementReturned = selector.node()
Once the correct element is selected it can be changed accordingly. The power of D3 comes from the data storage, which, ehr, does not get used for this, but it still has some advantages over vanilla JS or JQuery, especially if the code moves away from simply recolouring and moves to rotations, scalings etc. Hence why in this example I am not simply using vanilla JS as it is always wise to leave room for the potential to do something extra and is marginally quicker to type/follow.
To operate on elements that have an id attribute as specified in Illustrator, this.id will do it for a DOM node object, while selector.attr('id') will work for a D3 (or JQuery selector).

Say we passed a dictionary mapping text or id to a hex colour from Python to JS, we could use that to colour the elements. ...And this is were a lot of console testing happens (the dev console in a browser is a great tool and well worth getting to know). To recap selections:
// Vanilla JS DOM element
let el = document.getElementById("someID");
// D3 selector
let sele = d3.select("#someID");

SVG fill

On inline CSS (style attribute) in the case of shapes, the inner colour is controlled by the property "fill" —"color" is for text in regular elements and "background-color" is for regular elements. The property fill controls the border colour of SVG text elements.

To change the CSS of a DOM node (vanilla JS element) it's a property of the property style, e.g. el.style.fill = 'red';, In a D3 selector it the method selector.style(propertyName, value). Before you get out your pitchfork to promote vanilla JS, I should mention that D3 has some great features for manipulating colours, which is more convenient that manipulating hex codes. For example
let color = d3.rgb(redInteger, greenInteger, blueInteger);
generates a colour object that can be passed onto a D3 selector .style('fill', color).
Also shades, hues and tints of that colour can be made easily, e.g. color.darken(1) will give a colour 30% darker, which fancy folk will call a shade (cf. figure).

Further afield

Albeit not used here, enter, exit and data are very useful D3 methods that deserve an honourable mention.
 

Putting it all together

Two parts are required, one to change colours in JS and the other to pass the data to JS. This would be best as a dictionary with keys corresponding to text/id and value a dictionary of keys red, green, blue whose values are integers between 0 and 255 —list could work too, but this is more foolproof.
So let's start by defining a JS function to change the colours based on the given data.
%%javascript

// function to add protein
window.UpdateColors = (data, blank) => {
    // ********************
    // make an object with name -> D3 element of g
    const items = {};
    d3.select('#Layer_1').selectAll("g").each(function(d, i) {
        const groupSele = d3.select(this);
        const textSele = groupSele.select('text');
        const text = textSele.size() ? textSele.text() : groupSele.attr('id'); 
        items[text] = d3.select(this);
    });
    // spy on object?
    //console.log(JSON.stringify(items));
    // ********************
    // change fill color
    const blankcolor = d3.rgb(blank.red, blank.green, blank.blue); // 255,255,255 is white
    Object.keys(items).forEach((itemName) => {
        let color = data[itemName] !== undefined ? d3.rgb(data[itemName].red, data[itemName].green, data[itemName].blue) : blankcolor;
        // select all that have a fill already.
        let shapes = items[itemName].selectAll('line,path,polyline,circle,ellipse,rect,polygon')
                                    .filter((d, i, nodes) => d3.select(nodes[i]).style('fill') !== 'none');
        // color all
        shapes.style('fill', color);
        // color first lighter
        //shapes.filter((d, i, nodes) => i === 0).style('fill', color.brighter(1));
        // color last darker
        shapes.filter((d, i, nodes) => i=== nodes.length - 1).style('fill', color.darker(1));
    })
}
It also would be nice to download the image —unneccessary in Firefox, but needed elsewhere. So here is a snippet taken from a SO answer.
%%javascript
  
  // 
window.downloadSVG = (idName) => {
    //get svg element.
    var svg = document.getElementById(idName);

    //get svg source.
    var serializer = new XMLSerializer();
    var source = serializer.serializeToString(svg);

    //add name spaces.
    if(!source.match(/^<svg[^>]+xmlns="http\:\/\/www\.w3\.org\/2000\/svg"/)){
        source = source.replace(/^<svg/, '<svg xmlns="http://www.w3.org/2000/svg"');
    }
    if(!source.match(/^<svg[^>]+"http\:\/\/www\.w3\.org\/1999\/xlink"/)){
        source = source.replace(/^<svg/, '<svg xmlns:xlink="http://www.w3.org/1999/xlink"');
    }

    //add xml declaration
    source = '<?xml version="1.0" standalone="no"?>\r\n' + source;

    //convert svg source to URI data scheme.
    var url = "data:image/svg+xml;charset=utf-8,"+encodeURIComponent(source);

    //set url value to a element's href attribute.
    document.getElementById("downloadSVG").href = url;
    //you can download svg file by right click menu.
}
And somewhere sensible the download link —it can be styled as button as Jupyter notebook use BS3, that is showing off...
display(HTML('<a id="downloadSVG" onclick="window.downloadSVG('Layer_1')" download>download</a>'))
Now we can be fancy in Python. Say we have one or more pandas.Series we wish to colour the items by we could have something like
def update_colors(reds=pd.Series(dtype=int),
                  greens=pd.Series(dtype=int),
                  blues=pd.Series(dtype=int),
                  white_blank=True):
    
    normalise = lambda color_series : (color_series / color_series.max() * 255).astype(int) if str(color_series.max()) != 'nan' else {} 
    keys = {*reds.keys(), *greens.keys()}

    colors = pd.DataFrame(data=[[0, 0, 0]] * len(keys),
                         index=keys,
                         columns=['red', 'green', 'blue'])

    for gene_name, color in normalise(reds).items():
        colors.at[gene_name, 'red'] = color

    for gene_name, color in normalise(greens).items():
        colors.at[gene_name, 'green'] = color

    for gene_name, color in normalise(blues).items():
        colors.at[gene_name, 'blue'] = color
    
    if white_blank:
        blank = {"red": 255, "green": 255, "blue": 255}
        
        # this is suspicious
        r = 255 - colors.green - colors.blue
        g = 255 - colors.red - colors.blue
        b = 255 - colors.red - colors.green
        
        # kill negatives
        r[r < 0] = 0
        b[b < 0] = 0
        g[g < 0] = 0
        
        # assign
        colors.red = r
        colors.green = g
        colors.blue = b
    else:
        blank = {"red": 0, "green": 0, "blue": 0}
    display(Javascript(f'window.UpdateColors({colors.transpose().to_json()}, {blank});'))
    
    return colors
    
Which can be called with one or more series and an optional boolean to specify whether the blank colour is black or white.
import pandas as pd

reds = pd.Series({'protein A': 10, 'protein B': 110, 'protein C': 1})

update_colors(reds=reds)
protein Bprotein Cprotein A

Protein A

Protein B

Protein C

No comments:

Post a Comment