Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Thursday, 24 August 2023

Reading compressed molecular files on NFS


There are some tasks that make one feel like a failed door-to-door evangelist, one amongst these is proselyting about using compressed files on networked file systems. Namely, NFS are slower than local SSD drives, so most often it is actually quicker to read compressed files in memory rather than decompress them to disk. Here are two Python snippets for dealing with small molecule files.

Sunday, 2 July 2023

A note on PLIP interactions

PLIP is a handy tool to enumerate the interactions of a given ligand. However, a few of tripping point I keep having is related to the fact the interactions are namedtuples. Here are some notes to circumvent the traps.

Sunday, 5 March 2023

7 colour electronic paper

For Christmas I recieved a 5.65" seven-colour e-paper display, which is awesome. The catch as everything with a Raspberry Pi or Arduino is that beyond the gloss of the advert is something that is far from a flexible plug and play system. I enjoyed my voyage, but it was rather odd even if typical of a Raspberry Pi project.

Sunday, 22 January 2023

Typing emoji with a Pico keypad

Typing emoji with a Pico keypad


I got myself a Pimoroni RGB keypad, a keypad with 16 coloured buttons controlled by a Raspberry Pico. So the first thing I wanted to do was code it to output emoji, because I am very professional person. However, this was not a simple task as I had hoped.

Sunday, 20 November 2022

glibc 2.36 vs. CentOS 7: a tale of failure

My favourite part of coding is planning and implementing some cool idea for doing something, especially if it involves some fun maths I read up on Wikipedia a minute beforehand. In reality polishing dirty data, refactoring someone-else's bad code, reverse engineering the use of a module and trying to get stuff to work is what take up most of my time.

Having got cocky I thought I could get the latest GNU library for C (glibc) working on CentOS 7. I failed miserably, here is my sorry tale down the rabbit hole.

Friday, 4 November 2022

In ML a module is not a namespace but a base class, because... ?

Deep learning is changing the world and fast. The list of achievements is impressive, however, why focus on the positive, when we can moan about the negative? In this blog post I will discuss three minor details that I find annoying about deep learning, namely the key word Module, the limited use of Google/Coral Edge TPUs and the coding quality of the field.

Saturday, 8 October 2022

Star imports trick

Star-imports (from typing import *) in Python are a handy, but dangerous. They are meant for quick coding, i.e. like on a jupyterlab notebook. However they are bad as they can mask other variables and cause issues down the line. They are ubiquitous online as are guides explaining why they are bad, here I just want to share a handy snippet to iron out star-imports.

Tuesday, 18 September 2018

Python website on a shoestring budget, a tutorial

If you have a Python script you want to make into a website yet want to do it for a cheap as possible, it may seem like an impossible cause. But it isn't. Three options are possible:
  1. use a free service
  2. use a grant-based service
  3. run a server at home on a Raspberry Pi
Each have their merits, but the latter more so. Hence, why this tutorial is dedicated to showing you how to do it.

Saturday, 7 November 2015

How shall I name my variables?

Python and naming conventions

Clarity and simplicity are part of the Zen underlying Python (PEP 20). Simplicity in a large system requires consistency and as a result there are various rules and guidelines. Overall, that and the large number of well documented libraries is what makes Python fun. Although, the idea that Python is good is reinforced by the quasi-cultist positivity of the community, especially by Python-only coders that are unaware of some really nice things other languages can do.

In fact, there are frustrating Pythonic things that pop up, some that are extremely granny-state, such as

  • lack of autoincrementor because supposedly it is confusing (which it is not) and redundant (yet there are three different string format options)
  • chaining is not that possible with lists (filter, map, join aren't list methods) because it makes spaghetti code
  • pointers and referencing isn't a thing because it's confusing and dangerous
  • parallelisation is implemented poorly, relatively to Matlab or Julia
  • JavaScript asynchronicity is confusing, but the Python asyncio excels
  • And several more

But overall, it is very clean. One thing that is annoying is that the name styles are not consistent. There is a PEP, that names the naming styles (PEP8), but does not make good suggestions of when to use which. In my opinion this is a terrible shame as this is where stuff starts crumbling.
In brief the problem arises with joining words and three main solutions are seen:
  • lowercase
  • lowercase_with_underscore
  • CamelCase (or CapWords in PEP8)
There are many more, but meme are out there listing with way more pizzazz than I have.
The first is the nice case, but that never happens. I mean, if a piece of python code more than ten lines long does not have a variable that holds something that cannot possibly be described in a word, it most likely should be rewritten in a more fun way with nested list comprehensions, some arcane trick from itertools and a lambda. So officially, joined_words_case is for all the variables and the CamelCase is for classes. Except... PEP8 states: "mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility", aka. they gave up.

Discrepancies and trends

That a class and a method are different seems obvious except in some cases where it becomes insane.
In the collections library defaultdictionary and namedtuple are in lowercase as they are factory methods and the standard types are lowercase, while OrderedDictionary, is in CamelCase. Single word datatypes are equally inconsistent: Counter is in camel case, while deque is in lowercase. All main library datatypes are in lowercase, so it is odd that such a mix would arise, but the documentation blames how the were implemented in the C code. In the os library the method isdir() checks if a filepath (string) matches a directory, while in the generator returned by scandir() the entries have is_dir() as a method, which is most likely a sloppy workaround to avoid masking. Outside of the standard library, the messiness continues. I constantly use biopython, but I never remember what is underscored and what is not and keep having to check cheatsheets to the detriment of simplicity.
There are some trends in the conventions nevertheless. CamelCase is a C thing, while underscores is a Ruby thing: this probably makes me feel more safe using someone's library or script that uses CamelCase. Someone wrote a paper and found CamelCase to be more reliable in terms errors. Personally, I like lowercase all the way, no camels or underscores and the standard Python library seems to be that way and it is really pleasant.
FULL_UPPERCASE variables are often global variables used as settings or from code written by capslock angry people —actually, if Whitespace language is a thing, why is there no capslocks language? Visual Basic is case insensitive and it makes my skill crawl when I look at its "If" and "For" statements.
Single letter variables are either math related or written by an amateur or someone who gave up towards the end —such as myself all the time— because no word came to mind to answer the question "How shall I name my variable?".

My two pence: inane word newfangling

The built-in methods of the mainspace and datatypes all are lowercase without underscores (e.g. open("file.txt").readline()), so there is consistency at the heart of it. Except that lowercase without underscores is not often recommended as it is the hardest to read of the three main ways —it is the easiest to type and possibly remember. With the except of when a word is a verb and it could have been in the present, past or present participle forms. Plus open("file.txt").read_line() is ugly and I feel really anti_underscoring.
German and many other languages are highly constructive and words and affixes can be added together. I have never encountered German code, but I would guess the author would have had no qualms in using underscorless lowercase. The problem is that English in not overly constructive with words of English origin as most affixes are from Latin. The microbiology rule of -o- linker for Greek and -i- for Latin and nothing for English does not really work as Anglo-Latin hybrids look horrendous. Also using Greek or Latin words for certain modern concepts is a mission and, albeit fun lacks clarity. The Anglish moot has some interesting ideas if someone wanted a word fully stemming from Old English and free of Latin. Nevertheless, I like the idea of solely lowercase and coining new words is so fun —except that it quickly becomes hard to read. Whereas traditionally, getting the Graeco-Latin equivalents and joining them was the chosen way, nowadays portmanteaux are really trendy. In the collections module, deque is a portmanteau and I personally like it more as a name than defaultdictionary —How about defaultionary?
As a Hungarian notation for those variables that are just insane, I have taken to adding "bag" as a suffix for lists and sets (e.g. genebag) and "dex" for dictionaries (e.g. genedex), which I have found rather satisfying and actually has helped (until I have to type reduced_metagenedexbagdex).

Hungarian tangent

That leads me to a tangent, the hungarian notation. I wrote in Perl for years, so the sigil notations for an object's type left a mark. Writing st_ for string and other forms Hungarian notation would just be painful and wasteful in Python, but minor things can be done, such as lists as plural nouns and functions as verbs. Except it seems to go awry so quickly!
Lists, sets and dictionaries. Obviously, the elements should not be the singulars as that results in painful results, but I must admit I have done so myself too many times. Collective nouns are a curious case as it solves that problem and reads poetically (for sheep in flock), but there are not that many cases that happens.
Methods. An obvious solution for methods is to have a verb. However, this clearly turns out to be a minefield. If you take the base form, many will also be nouns. If you take the present participle (-ing) the code will be horrendous. If you take the agent noun (-er, -ant), you end up with the silliest names that sound like an American submarine (e.g. the USS listmaker).
Metal notation. The true reason why I have opened this tangent is to mention metal notation. If one has deadkeys configured (default on a Mac) typing accents is easy. This made me think of the most brutal form of notation: the mëtäl notation. Namely, use as many umlauts as possible. I hope the Ikea servers use this. Although I am not overly sure why anyone would opt for the mëtäl notation. In matlab there are a ridiculous number of functions with very sensible names that may be masked, so the mëtäl notation would be perfect, except for the detail that matlab does not like unicode in its variables. One day I will figure out a use…
Nevertheless, even though Hungarian notation is somewhat useful, Python seems to survive without it: I personally think that most of the time when issues happen is with instances of some weirdo class and not a standard datatype anyway. So there is no need to go crazy with these, it is just fun.

Conclusion

Nevertheless, even if there were a few exceptions, it is my opinion that a centralised Pythonic ruleset would have been better. The system that I would favo(u)r is compulsory lowercase, as is seen for the built-in names — parenthetically, American spelling is a given, it did not take me long to spell colour "color" and grey "gray". The reason why lowercase is disfavoured is because it is hard to read when the words are long. In my opinion variables names should not be long in the first place. One way around this is making a sensible portmanteau or a properly coined word and just restraining from overly descriptive variables. At the end of the day, arguments of legibility at the cost of consistent and therefore easy usage makes no sense. defaultdictionary takes a fractions of a second more to read, but looking up how a word is written takes even minutes.

Wednesday, 14 October 2015

A note about serving static files with Python's wsgi

The best way to learn how to use python for the web is to make something simple in Flask, revisit the tutorial on how to set stuff up with more than a single file and then graduate to a framework like Pyramid or Django.
Starting vanilla and using wsgi is a different matter as one gets bogged down in stupid things and is not actually didactic. One issue I had was serving static files.

Monday, 28 September 2015

Pythonic spinner

Python is fun: it has lovely libraries, is a beauty to type and there are constant surprises — I only recently found out that 3.4 had introduced defaultdict() (collections library), which is phenomenal. With the web there are three options:

  • It can be used on the server-side on the web with the wsgi library or the Danjo framework. Open-shift is a great fremium script hosting service —I have used it here for example.
  • There are also some attempts to make JS parse python in the browser, namely Skupt and Brython, but as they convert the python code into JS code, so they are not amazingly fast (1), but you are showing the world python code.
  • One can transpile python to javascript, which is faster and less buggy, but that is unethical as you'd be serving JS and not a python script —CoffeeScript gets a lot of bad rep for that reason.

Thanks to CSS3 there are a lot of cool spinners out there to mark code that is loading, but none cater for python users. Therefore I made my own spinner icon, specifically: .
The code is hosted in my dropbox: https://github.com/matteoferla/Pyspinner.
<link href="https://rawgit.com/matteoferla/Pyspinner/master/pyspinner.css" rel="stylesheet"></link>
<span class=pyspinner></span>

(1) I tried Brython and liked that it had a mighty comprehensive series of libraries and that it had DOM interactions similar to JQuery. However, I could not get over the fact that, for me at least, changes to DOM elements were not committed until the code finished or crashed —which brought back bad Perl memories. Also the lack of CSS changes and the nightmare of binding functions to events makes me think I might try other options.

Tuesday, 3 March 2015

Bacterial Dissociated Press

UPDATE: The code has been adapted to become am online bacterial name generator thanks to Brython.
There are several ways to make a random name generator and it depends mainly on the probability of the letters. A Markov chain is a model where the probability of finding a letter depends on the previous letter(s). In the script I wrote, the first and last letters are underscores thus preserving the probability of first and last letters, the downside is that there is no letter minimum or maximum:
These are two names I got out of 100 trials:

  • Cobaceobacumomyeriflvijerellaniocidenactilasttiacobacoballlitelanas priartrogntubophae
  • Ps ge

The dataset used to obtain the probabilities of letter combinations (training) influences the results obviously. Here I parsed LPSN for species names and treated the genus and the species epithet differently.
Amusingly, using only two letters combinations gives fairly Latin sounding names. Here are five non–cherry-picked examples:

  • Vucocovinenacrenistrngidobacirdocotromyconsipumostereus ateue
  • Peus s
  • A zonvi
  • Aceropareainobastheptha menzi
  • S lfa
  • Kllacererhelucoccimycelusm ngum
  • Lumibodom pensidiaqiscusicanise
  • M psplumolinisonolacia
  • Rerella or
  • Lochadibriciziberononacomyctritr selietimolerbumiseelaltaleni

They look like species names, but are not quite right. Some affices appear as 'Kllacererhelucoccimycelusm' testifies (cocci, myce), but also impossible consonant clusters for Latin (klla, which I'd try reading that as kł; ng at the start of a word is a normal in many languages, but Latin does not have a ŋ sound). Going through the list there are some cool ones that look diabolically hard to read (Seogactychoreriumacesheraes wautenditolans), have more Welshness (Llacosula thiabisini) or have recognisable words (Mum joliaiinecuvigoransis).
When the kmer is changed from 2 to 3, the names make a lot more sense. A lot more affices surface and less weird consonant clusters. Here are 100 non–cherry-picked entries.
I like that one is 'Clobacterium aliens'. There are some other real words and some odd ones, such as yogalitre —a novel bohemian unit of volume.
  • Enoalobariononasacter futerxiackitrum
  • Lapper ase
  • Paulfobiumickeyerium yogalitre
  • Strellum gitormeneaericus
  • Shellococolanoreptomycococchrio mens
  • Chthenia fentanonii
  • Sacilla abus
  • Strevosphinobiseudoanopla baalii
  • Microidus cociensis
  • Sporynes virotum
  • Microma aneaxenni
  • Bacteimetocobactomonomicinorynterogangioidethia turomedurpellis
  • Clospora aliensiseudorandrophamper
  • Des paris
  • Clostrenematisher phizoreens
  • Microbacter ansifixtrolytins
  • Acterium psyrarchens
  • Clostreptomuromanterobium gina
  • Bureptostrevoserobacter amardaenii
  • Phaeudobacillusobactomadalmotenbellus aturivalinastis
  • Nobactia senteriensis
  • Clobifidiumonaeriomacinocellus glus
  • Derium se
  • Psia se
  • Preptoraneobactetacter prosalisquaes
  • Actima vense
  • Gemacia crynifichilum
  • Angobacilla chaiwalficaphiphilutyris
  • Exus thens
  • Mer imarinus
  • Streptophizawanorevibactermonas mesbensis
  • Furax anderobicum
  • Flactermonas paramesolachilum
  • Sulus sisidigii
  • Pseudoalder cophiilatis
  • Haenimicrobacila vionii
  • Stahayloseucocorobacilelochrotrea piresele
  • Bacillus agnis
  • Statospormonovobachabacteobacilibrix koreschum
  • Burkhodosalicronoccus licuni
  • Des nonca
  • Ardium sponensis
  • Aces cysalkii
  • Coccus permervensicarinea
  • Streptophizobacterotobacillatomycetburadsketinossobactellum chrophitrariosteracecidiolipalense
  • Streptobacilebacillum sphonensii
  • Rum tus
  • Macesula rundebens
  • Sponaeringomonas sophagii
  • Lyanosphoebacterium matus
  • Amurandroidia clocynitucum
  • Chizoodomicillanelluseubroidobacococas hungbaiwadimuyanensis
  • Xybacterium alkapiensis
  • Amycetoba solitophae
  • Actomonaspobactomonia mer
  • Clobacterium aliens
  • Pla fereenzotidimareequatus
  • Er premanum
  • Xanomonerium licatus
  • Methylososphio inaeens
  • Pedium inans
  • Matoces marum
  • Paea mae
  • Nostreptomanobacter montemophiganginhila
  • Almonaenoclovardium nerrakus
  • Metherococycobium flugongtoriensisciforyntica
  • Seudocococchaenibbellus thilucentoluceus
  • Mydobacter pedia
  • Hangium facransis
  • Strenebsibacromycona lacamingensii
  • Agium stidalis
  • Glum stans
  • Des vibensenigens
  • Cobacteudonacteringium inis
  • Es amitra
  • Alium thili
  • Palgomaduradreptomona wolis
  • Brix vi
  • Phillas marboryzaeongwanenicus
  • Bium callus
  • Zhio lense
  • Thanas wayticolipleolensicalbies
  • Nospirsia ebadiaense
  • Des adaicowskiabilicus
  • Kocococolamsis ans
  • Amyces danivoranse
  • Bruiherillosporreptomyces thillupraecae
  • Satrellas geolisis
  • Des stum
  • Bacillum putiduciticus
  • Stertingobium stenti
  • Spirga odanae
  • Roger oleovoricula
  • Feria asterium
  • Pellacteriodoccus flatus
  • Sphylobacter yogensis
  • Pseudobacter sis
  • Sphomyxillimora latus
  • Strepla puyatlasis