In ML a module is not a namespace but a base class, because... ?

Friday 4 November 2022

In ML a module is not a namespace but a base class, because... ?

Deep learning is changing the world and fast. The list of achievements is impressive, however, why focus on the positive, when we can moan about the negative? In this blog post I will discuss three minor details that I find annoying about deep learning, namely the key word Module, the limited use of Google/Coral Edge TPUs and the coding quality of the field.

Module

In programming a 'module' is a contained collection of objects, functions and variables. In deep learning a module is an abstract base class that is used to build neural networks classes. This is confusing because the word module is used in both contexts. The latter is most commonly found in Title case, 'Module', as it's a class.

How on earth something like this was allowed to happen is nonsensical. I could not find any mathematical precedence, so the only explanation I can give it was chosen in haste (vide infra the cowboy coding point).

As deep learning is cooler and shiny than programming theory, advocating that the community use the word 'base class' would not work, so what synonyms can be used for the standard module?

  • namespace: there is a technical difference (scope) between a namespace and a module, just like there's a difference between a list and an array, but it's close enough. In Python there is no namespace type, but there is a module type, in C++ you have only the former and in TypeScript you have both.
  • Not package: a package is a directory of modules, a filesystem thing. A module or submodule is generally a file, but you can declare a module dynamically thanks to types.ModuleType.
  • Not library or collection: a library is a collection of functionality/modules. A module ought to encapsulate a single functionality per the principle of information hiding.
  • Not container: containerization is an OS-level virtualization technique.
  • Not bucket: an Amazon S3 bucket is a storage container accessed via its web service interface.
  • Not bundle: that is a videogame sale term for something old that would not sell otherwise.
  • Not shelf, shelving, rack: these are all physical cluster objects.
  • Not bag: a not quite synoym for Counter
  • Not any other physical object like crate, chest, trunk, bag, sack, pouch, basket, box, can, jar, pot, vessel, vase, case, cabinet, cupboard as most likely there is already taken...

Google Coral Edge TPUs

An Nvidia A100 CPU costs as much as a car. A Google Coral Edge TPU costs as much as a beer in a Youngs pub (~£50). A further cool feature is that there's a USB stick version of the latter. However, it can only be used with TensorFlow Lite, which is a subset of TensorFlow. Models cannot be trained in TensorFlow Lite, only converted from TensorFlow. A model trained with TensorFlow can be used in TensorFlow Lite, if the numbers are reduced from 32-bit floating point numbers to fixed point 8-bit integers (post-training integer quantization). This is not great as precision near zero is important. There was criticism earlier this year about several transformer models tinkering with the kernel's subnormals (numbers between 0 and the smallest normal number) which is a terrible thing to do, but illustrates why precision near zero is important and a reducing this precision is not great. The Posit format of number storage is gaining momentum: in float-point arithmetic a real number is stored as a power, e.g. -2.1e10, where you have a sign bit, some exponent bits and some significand/mantissa bits (sign * mantissa * 2^exponent), in a posit you have an extra terms (sign * mantissa * 2^exponent times unseed^regime-ish), where the extra term is based on the exponential part, making it an exponential of an exponential, making it more precise near zero (tapered precision) and allowing it to be system-friendly (i.e. no new hardware). So potentially with better number storage, the fixed point 8-bit integer quantisation could be replaced with a Posit format. But that is in the future, but I am sure it will come fast and we will stop drooling over GPU cards with 5 digit price tags in the same way folk stopped drooling about clock speeds in the 90s.

Cowboy coding

The field of deep learning is young and the code is not great. The speed in which the field advances favours 'cowboy coding': write fast, document and fix latter, never refactor. This is not great as the code is harder to read and to evolve. Jax is expanded by Haiku to replace Keras —Haiku is inspired by Sonnet, neither related to Poetry. TensorFlow is losing ground to PyTourch. The different packages gain and lose popularity quickly. A prime example is installing them as they were dumped on PyPI without any thought for dependencies. There is a big difference between CUDA and CPU based arithmetic, but a drop-in replacement is a terrible choice as instead happens for tensorflow-gpu. Many of these packages change their API frequently, which is not great for the maintainability of the packages that depend on them. As a result the solution adopted by the community is to pin the versions, which results in installation woes, which are resolved by conda environments, dockerisation and containerisation (these are meant for other purposes). This is not Pythonic in any way, but this laziness is becoming more and more common that some argue that this will kill Python.

Parenthetical disclaimer, installing tensorflow is not the simplest task partially not due to its own fault, but due to the fact that it requires CUDA drivers and the CUDA toolkit in the system, although the latter can be installed with conda, but one must make sure that the version matches the drivers (nvcc --version).

I switched from Perl a decade ago, because the package Moose (object-oriented programming) was too messy. Python is changing however: typehinting is a fantastic addition and 3.10 is said to be faster. Typehinting allows one to get a quick idea of what a variable is. It makes debugging much quicker and makes revising old code a lot less painful. The speed is something I don't quite get, as folk get pendantic over the speed of a few microseconds, yet code that runs slowly is generally so because of poor design, not because of the language.

Pip is changing too: The release of a package to PyPI is often bungled due to user-inexperience but also due to its non-straightforwardness. Packaging non-python code is a common tripping point and the source of many bugs. Alternative installation routes rise and fall (remember easy_install?) —the flavour of the month is poetry. Wheels for compiled code and pip-based virtual environments are a great addition that are making conda a thing of the past.

However, these do not address the root of the problem: cowboy coding. Hopefully, the deep learning field will mature and move away from cowboy coding. But there is excitement of AI writing code, so I am not holding my breath...

No comments:

Post a Comment