I got recently asked how to figure out how to write a Pyrosetta script when there is no example. This is definitely the biggest weakness of Pyrosetta and Rosetta script, but it is not insurmountable. In fact, there is a wealth of information that is hidden that can be mined. Here is how and in the next post, I give an example.
Repository
In writing this post I put together a bunch of codeblocks into a class (Documentarian
) that can be found in
the GitHub repo Pyrosetta
documentarian (any mention of this has green borders).
Resources
So let's start from the top. How to use Pyrosetta in the first place is documented in a few places.- pyrosetta.org/documentation has the basics introduction...
- ... but pyrosetta.org/faq is the actual documentation
- pyrosetta.org/tutorials has some examples...
- ... but github.com/RosettaCommons has better and more explanations
- graylab.jhu.edu/PyRosetta.documentation has the API endpoints
- And obviously reading the relevant papers is really helpful for both Rosetta and Pyrosetta —two must reads are Relax paper and the second ref2015 paper.
Choosing a mover
The sometimes trickiest part is figuring out which mover to use. If there's a binary that you want emulate so that is great. The
table of movers has the major ones, but there are many more. For something not listed that you
suspect Rosetta/Pyrosetta can do the next best options are...
(a) ask and get lucky with a reply,
(b) Google until you get lucky or
(c) search/read the names of the protocols
listed in the Pyrosetta
documentation until you get lucky.
One thing to remember is that sometimes names are misleading. For example, there are a dozen methods called "ddG": some are for mutation scoring, other for interfaces or ligands. The NetCharge filter is pyrosetta.rosetta.protocols.simple_filters.NetChargeFilter
and not pyrosetta.rosetta.core.scoring.netcharge_energy.NetChargeEnergy
.
Another is not to give into the tyranny of sunken costs: if you gave it a go for more than an hour, let it go and try a different approach —for example, I wanted to get the pKA of a protein from model as opposed to from sequence, so I tried pyrosetta.rosetta.protocols.simple_moves.ReportEffectivePKA
, which wanted LYS_D params file etc. only to encounter more issues, until I read the C++ code comments, which made me realise it wasn't what I wanted in the first place...
Using a mover
Given a mover there are two things we want to know, how to use it and what it does internally. Let's look at the first challenge.
GitHub
So there are a couple of tutorials online and searching for the name of the mover may reveal them. Search the mover name prefixed with an asterisk may reveal others. Alternatively search the mover without the word Mover at the end as some don't have that part in the C++ code. If there are many tagging the word GitHub can reveal some handy code snippets. Not all places are however googlable, for example there are some handy files in the Jens Meiler lab webpage but are zipped or as pdfs.Converting a XML script
Alternatively convert an XML script.Finding the XML script
In the Rosetta folder under the pathmain/source/src/main/tests/integration/tests
,
there are many XML scripts. Additionally, more can be found in various places mentioned above.
However, the amount hiding in that path is staggering.
Prepare for conversion
There are many ways to do the XML Rosetta script to Pyrosetta, but for me, the best way is in a Jupyter notebook doing the following steps.- Copy-paste the XML in a markdown cell, but tab it to prevent rendering of the XML tags.
- Start Pyrosetta preferably with some of the flags for the XML (a file called flags or commands.txt) —later you can use the Options function.
- Import a pose, preferably the one with the XML.
- Then copy each tag into a cell as a comment —
<MoverName ...<
is called a tag, which you see a lot in Pyrosetta methods. - And start re-writing (see below)
- Testing each block as you go along (see below)
Converting tag by tag
Each tag starts with a name in CamelCase. The SCOREFXN will most likely require a particular set of weights, but you create that withpyrosetta.create_score_function
.
The Pyrosetta equivalent of the filters, mover or tasks tags will have the same name, except that they may have an extra word "Mover" or "Filter" or "Task" lopped onto the end. Simply search the Pyrosetta documentation for them.
The argument names to initialise these will likely differ however. Many constructors (
__init__
magic method in Pythonese)
are overloaded methods, which means they can be initialised with different sets of arguments.
The constructors of classes are not in the online documentation, but help(...)
or trying to
initialise with something clearly wrong, say Exception
, will tell you what it expected —do pay attention to the typehinting as this will reveal what is meant by a given argument.
In most cases there will be setters,
.set_👾👾()
or if not, the method name that seems more similar will be overloaded
to be a setter or a getter. An example of this is setting a scorefxn, which is via .scorefxn(scorefxn)
in regular movers,
but set_scorefxn
in MCMC movers...
A quicker way that looking it up online is passing your mover (instance or class) to
dir(...)
in a new cell and reading the list.
In some cases vectors are required and sometimes a method accepts a blank vector and fills it as seen in the post about adding waters in Pyrosetta, where vectors are also discussed.
Testing
To use a score function simply call it with the pose as an argument (scorefxn(pose)
),
while for filters and movers use .apply(pose)
.
Leave the movers last and don't forget to give it the filters, taskFactory and scorefunctions. However, you might get a segmentation fault. This means that something was attempted to be accessed but was not there.
Even if the XML script does not explicitly set a TaskFactory or MoveMap or scorefxn, it does not mean that you don't have to. If a mover has one or more methods for these, chances are you will need to specify them. A blank one will do however:
tf = pyrosetta.rosetta.core.pack.task.TaskFactory()
mover.task_factory(tf) # segfaults otherwise
In this regard the pyrosetta.rosetta.protocols.filters.TrueFilte
filter which always returns true, pyrosetta.rosetta.core.select.residue_selector.TrueResidueSelector
selects everything and
pyrosetta.rosetta.protocols.moves.NullMover
mover whose apply
does nothing (which differs from pyrosetta.rosetta.protocols.moves.Mover
, which has a virtual apply
as discussed below)
Running an XML script in Pyrosetta
To run a XML script in Pyrosetta you do the following:xml_parser = pyrosetta.rosetta.protocols.rosetta_scripts.RosettaScriptsParser()
protocol = xml_parser.generate_mover_and_apply_to_pose(pose, "👾👾.xml")
protocol.apply(pose)
This is rather simplistic, as you may require some command line arguments, such as -parser:script_vars
,
which you can feed with the pyrosetta.init
command. I discuss this further
here.
Update: the syntax has changed:
from typing import List
import tempfile, os
xml_parser = pyrosetta.rosetta.protocols.rosetta_scripts.RosettaScriptsParser()
var_names: List[str] = re.findall(r'%%([\w_.]*)%%', xml_string)
print(var_names) # copypaste to assign prefered values
script_vars = pru.vector1_std_string()
script_vars.extend(['foo=1', 'bar=2'])
with tempfile.NamedTemporaryFile(delete=False, mode='w+') as fh:
temp_name = fh.name
fh.write(xml)
xml_parser.generate_mover(temp_name, script_vars)
os.unlink(name)
Namely, generate_mover_and_apply_to_pose
no longer exists, but generate_mover
is more pythonic.
The
protocol
variable above is an instance of
pyrosetta.rosetta.protocols.rosetta_scripts.ParsedProtocol
, which we can inspect.
An XML script has a MOVERS
block with one or more mover tags.
To see what the movers are within a ParsedProtocol object you can call .get_mover(i)
, which will be a normal mover...
But filled correctly!
If you are using pyrosetta_documentarian.Documentarian
, the method .compare()
is very useful as it will give a Pandas table showing the differences with a blank object.
Inspecting the inheritance
All the movers are built upon another. So it can be highly useful to see which methods have been added. Also some methods are virtual only in the parent class (that is they are just a placeholder noot to be called, similar to methods of abstract base classes in Python.If you are using pyrosetta_documentarian.Documentarian
, the dynamic attribute .uninherited
will tell you what isn't inherited.
The cogs of a mover
provide_citation_info method
If you want to know a technical details of a mover, like how it works, you can see what paper it is published in, by calling the bound methodprovide_citation_info()
, which actually gives a rather meaningless return, which
you need to finangle:
def get_citation(target_mover: pyrosetta.rosetta.protocols.moves.Mover) -> str:
ci = target_mover.provide_citation_info()
if len(ci) == 0:
return ''
else:
cc = ci.pop()
# print(cc.module_name(), cc.module_type())
buffer = pyrosetta.rosetta.std.stringbuf()
cc.get_citations_formatted(pyrosetta.rosetta.std.ostream(buffer))
return buffer.str().strip()
If you are using pyrosetta_documentarian.Documentarian
, the dynamic attribute .citation
will do this
Author → Paper
Alternatively and often more fruitful is finding who the author is. In the C++ code it is the @author line.
Unfortunately, the online Pyrosetta documentation does not show C++ class comments or the constructor (=__init__
in Python) comments, so looking at the actual C++ code is often helpful, but more on that later. Once you have an
author name it is easy to figure out which paper it corresponds to.
If there is nothing at all and Googling the mover name draws blanks, emailing one of the authors is a good option. For example, curious about one mover I got a helpful reply that even included a paper draft.
Read the C++ code
This is really a last resort. However, you only need to know the bare basics of C++ to read it, so it is not that dire.
Rosetta is written in C++ and Pyrosetta wraps this using Pybind11. Some parts of Pyrosetta are pure Python and you can read the code.
The Pybind11 part however does not show the C++ (it's precompiled),
so you'll have to download the regular Rosetta suite to read the C++. So let's see some basics in order to
reverse engineer a class!
Header files
The main thing to understand is that C++ code is often spit into header files and implementation files. When looking a introductions to C++, it has several features in common with Python or JavaScript, but the header file business is totally alien. Basically a C++ class can be split into an empty shell that simply declares the methods within, which is in the header file, a file with the extension.h
, and this shell gets filled in the implementation file, a file with extension
.c
or .cpp
. The code in the header file looks more similar to Python, while the implementation file
has many ::
bits.
Class
A C++ class written in a single file can look like this:class Animal {
private:
// Private attribute
int age;
public:
// some
void Eat() {
std::cout << "Nom nom\n";
}
// Setter
void setAge(int a) {
age = a;
}
// Getter
int getAge() {
return age;
}
};
There are a few things to note here:
- public and private (access) are equivalent to Python single preceding underscore notation, but whereas it's bad practice to call a _method in Python outside of the class instance, you cannot call a private method in C++ outside of the instance.
- Typehint came only recently to Python, but it's mighty useful. In C++ specifying types is compulsory and you cannot change the type of a variable. in the above there is an int and a void (=None). What the methods return is specified before the name
- In python when you import a module you can get its contents with
module.method
kind of notation. In C++ it's similar, namelynamespace::method
. In implementation files, you'll seeclass::method
. - There are no docstrings, but comments (
//
or/* ... */
) above in the header files. These may feature@author
,@description
and other Doxygen-style comment block features. Not all of these went into the python docstrings and then the Sphynx documentation online for Pyrosetta
If you are using pyrosetta_documentarian.Documentarian
, the dynamic attribute .comments
will show all the comments.
No comments:
Post a Comment