Working around segmentation faults of pyrosetta: threads & processes

Friday 21 February 2020

Working around segmentation faults of pyrosetta: threads & processes

Rosetta often does not die gracefully. Pyrosetta is the same. If the starting template is not great segmentation faults will result in the kernel issuing signal 11 to kill the process. The way around it is to spin it up as its own process via the multiprocessing module and not the threading module, because child threads use the same process.

Signal handling is too late

A Segmentation fault is not an error that gets raised is a signal given to the process to terminate it. As a result, in a Jupyter notebook you'll get a modal that informs you that the IPython kernel is restarting. signal.signal(signal.SIGSEGV, sigHandler) can block it in the main thread, but once it is blocked all child threads have terminated and there is only ruins of what was once a glorious process.

Testing code

From the technical point of view, it occurs when the wrong memory address is tried to be accessed. There are many ways that it can happen, including as a result of a stack overflow, for example. These two lines will raise a segmentation fault.
import ctypes
ctypes.string_at(0)
This happens because there is no string at address 0. When it is run in a script it terminate it. In IPython it does the same:
(base) matteo@brc10:~$ python
Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> ctypes.string_at(0)
Segmentation fault (core dumped)
(base) matteo@brc10:~$ 
If this is excuted in a thread it will kill not only the thread but the process, therefore, it there if it happens in a job managed by the APScheduler the whole thing will crash. If it happens for a request in Pyramid or Flash, it will kill the application. Here is an example that terminates the script:
import threading
import ctypes
def subpro():
    ctypes.string_at(0)
t =threading.Thread(target=subpro)
t.start()
time.sleep(0.1)
print(t.is_alive())
In a Jupyter notebook, the IPython kernel will restart for that notebook, but not the others are unaffected. This is because they are different processes with different ids. Sister threads get run within the same core and share data, whereas processes may not.

Multiprocesses

The distinction between processes and threads is important. In fact, paralleling code with the threading module will be limited to one core and does not step over the Global Interpreter Lock, while multiprocessing is not limited. Threads are great for simple things like timeout timers for which there is in fact a inheriting class called Timer that does exactly that. For networking application asyncio is similar but works more like JavaScript and is more efficient at (a)waiting for the response to come back.
This partially why Pool is a class in multithreading and but not in threading. To get reponses back a subprocess requires one of various communication means, such as Pipe. Here is the above method, but with the twist that it dies on command. Between this Abraham-Isaac pair of methods communication is done via Pipe. However, if the process dies, the pipe receive command (parent_conn.recv()) will wait for eternity.
from multiprocessing import Process, Pipe
from threading import Thread
import ctypes, time

def subpro(must_die, conn):
    if must_die:
        ctypes.string_at(0)
    else:
        conn.send('Thanks you for not killing me.')

def hyperpro():
    parent_conn, child_conn = Pipe()
    for must_die in (False, True):
        p = Process(target=subpro, args=(must_die, child_conn))
        p.start()
        msg = parent_conn.recv()
        print('I asked for you to die?', must_die, 'Are you alive? ', p.is_alive(), 'Thoughts?', msg)

hyperpro()
To avoid this the reading must happen after that the subprocess ends with an error message queue up in case there is no message.
def hyperpro():
    parent_conn, child_conn = Pipe()
    for must_die in (False, True):
        p = Process(target=subpro, args=(must_die, child_conn))
        p.start()
        while 1:
            if not p.is_alive():
                child_conn.send('The dead do not speak')
                msg = parent_conn.recv()
                break   
        print('I asked for you to die?', must_die, 'Are you alive? ', p.is_alive(), 'Thoughts?', msg)
the whole pipeline does not come tumbling down! Hurray! Well... Unfortunately, if the data sent thorough the pipe is large, the pipe will not send all the data at once and terminate the child, but will kind-of-wait forever as it is still alive. This is the case with pyrosetta, but luckily there is another command pipe.poll(), which gives a true or false:
while 1:
    if parent_conn.poll():
        break
    elif not p.is_alive():
        child_conn.send({'error': 'segmentation fault'})
        break
    else:
        pass

No comments:

Post a Comment