Getting lists of things
Often I find myself needing a list of things of given type for some hobby project or other. Say all airplanes, Greek heroes, Roman emperors or stars.
All Wikipedia pages fall under number of categories, listed at the bottom of the page. These are a great way to get all the articles of that grouping. For some things there are probably good JSON files or CSV files, but for others it's a lot harder. Stars fall under this category with a great dataset (HYG database), but for illustrative purposes I'll use them anyway as it's a simple category.
Exploring a category
This is not what we were looking for... |
For example, let's take as a random starting point Tabby's star (the star with strange properties that was hypothesised to have a Dyson sphere, before astrophysics found out it was something "extremely interesting", but way less than alien megastructures) —not obscure, but not mainstream. It is tagged with several categories, including Category:F-type main-sequence stars, which is a child category of Category:Main-sequence stars, which in turn is a child of Category:Stars by luminosity class. The more one goes up the broader it gets, except that child categories you probably don't want get in there. Say Category:Stars contains Category:Coats of arms with stars, so if I scraped from there I'd get crap like the Seal of Oklahoma. So it is best to explore a bit the category architecture. To avoid this rather than fighting it afterwards.
Exploring the infobox
Articles of a given subject with all have a common infobox on the right hand side. In the case of a star, this is a star box. The wikimarkup code (click edit, but change the query string value from edit to raw, e.g. &action=raw) for a different star, Δ Cephei, which is also a corner case:
{{about||the variable star type|Delta Cephei variable|the general class of variable stars|Cepheid variable}}
{{Starbox begin
| name = Delta Cephei
}}
{{Starbox image
| image=
{{Location mark
|image=Cepheus constellation map.svg
|float=center
|alt=
|label=
|position=right
|width=280
|mark=Red circle.svg
|mark_width=10
|mark_link=δ Cephei
|x=613|y=1062
}}
|caption=Location of δ Cep (circled)
}}
{{Starbox observe 2s
| component1 = δ Cep A
| epoch = [[J2000.0]]
| constell = [[Cepheus (constellation)|Cepheus]]
| ra1 = {{RA|22|29|10.26502}}<ref name=aaa474_2_653/>
| dec1 = {{DEC|+58|24|54.7139}}<ref name=aaa474_2_653/>
| appmag_v1 = 4.07 (3.48–4.37) / 7.5
| component2 = δ Cep C
| ra2 = {{RA|22|29|09.248}}<ref name=aaa474_2_653/> <!--Right Ascension of the third component-->
| dec2 = {{DEC|+58|24|14.76}}<ref name=aaa474_2_653/> <!--Declination of the third component-->
| appmag_v2 = 6.3 <!--Apparent magnitude of the third component (Johnson-Cousins V system)-->}}
{{Starbox character
| class = F5Ib-G1Ib<ref name=engle>{{Cite journal | doi = 10.1088/0004-637X/794/1/80| title = THE SECRET LIVES OF CEPHEIDS: EVOLUTIONARY CHANGES AND PULSATION-INDUCED SHOCK HEATING IN THE PROTOTYPE CLASSICAL CEPHEID δ Cep| journal = The Astrophysical Journal| volume = 794| issue = 1| pages = 80| year = 2014| last1 = Engle | first1 = S. G. | last2 = Guinan | first2 = E. F. | last3 = Harper | first3 = G. M. | last4 = Neilson | first4 = H. R. | last5 = Evans | first5 = N. R. |arxiv = 1409.8628 |bibcode = 2014ApJ...794...80E }}</ref> + B7-8<ref name=evans>{{cite journal | doi = 10.1088/0004-6256/146/4/93 | title= BINARY CEPHEIDS: SEPARATIONS AND MASS RATIOS IN 5 M ☉ BINARIES | journal=The Astronomical Journal | date=2013 | volume=146 | issue=4 | pages=93 | first=Nancy Remage | last=Evans|arxiv = 1307.7123 |bibcode = 2013AJ....146...93E }}</ref>
| r-i = <!--R-I color-->
| v-r = <!--V-R color-->
| b-v = 0.60
| u-b = 0.36
| variable = [[Cepheid variable|Cepheid]]
}}
{{Starbox astrometry
| radial_v = -16.8<ref name=anderson15/>
| prop_mo_ra = +15.35<ref name=aaa474_2_653/>
| prop_mo_dec = +3.52<ref name=aaa474_2_653/>
| parallax = 3.77
| p_error = 0.16
| parallax_footnote = <ref name=aaa474_2_653/>
| dist_ly = {{nowrap|887 ± 26}}
| dist_pc = {{nowrap|272 ± 8}}<ref name=benedict02/><ref name=majaess2012/>
| absmag_v = {{nowrap|–3.47 ± 0.10}} {{nowrap|(–3.94 - –3.05)}}<ref name=benedict02/>
}}
{{Starbox detail
| component1 = δ Cep A
| mass = {{nowrap|4.5 ± 0.3}}<ref name=apj744_1_53/>
| radius = 44.5<ref name=apj744_1_53/>
| gravity =
| luminosity = ∼2000<ref name=apj744_1_53/>
| temperature = 5,500–6,800<ref name=moore/>
| metal_fe = +0.08<ref name=aaa488_1_25/>
| rotational_velocity = 9<ref name=ciako1970/>
| age_myr = ~100
| component2 = δ Cep B<ref name=anderson15/>
| mass2 = 0.2 - 1.2
<!-- Unfortunately, the Starbox template will only show two components. Since the physical association between δ Cep B and A is much clearer than between C and A, it makes sense to keep B visible for now. Ideally, one should show C as well.-->
| component3 = δ Cep C
| luminosity3 = 500
| temperature3 = 8,800<ref name=apj744_1_53/>
}}
{{Starbox orbit
| reference = <ref name=anderson15/>
| name = δ Cep B
| primary = δ Cep A
| period = 6.03
| eccentricity= 0.647
| k1 = 1.509 ± 0.2
| name3 = Delta Cephei C
| period3 = <!-- Previously listed value of 500yrs is much too small, although no good estimate available -->
| axis3 = <!--Semimajor axis (in arcseconds)-->
| axis_unitless3 = 12,000 [[Astronomical unit|AU]]
| eccentricity3 = <!--Eccentricity-->
| inclination3 = <!--Inclination (in degrees)-->
| node3 = <!--Longitude of node (in degrees)-->
| periastron3 = <!--Periastron epoch-->
| periarg3 = <!--Argument of periastron (in degrees)-->
| mass3 = <!-- Listed figure of 54 solar masses highly dubious, so I'm hiding it. -Gnomon -->
}}
{{Starbox catalog
| names = 27 Cephei, [[Bonner Durchmusterung|BD]]+57 2548, [[Fifth Fundamental Catalogue|FK5]] 847, [[Henry Draper catalogue|HD]] 213306, [[Hipparcos catalogue|HIP]] 110991, [[Harvard Revised catalogue|HR]] 8571, [[Smithsonian Astrophysical Observatory Star Catalog|SAO]] 34508.
}}
{{Starbox reference
| Simbad = delta+Cep
}}
{{Starbox end}}
As we can see there is a lot of stuff that makes data mining hard, e.g. <ref>. But there is also a large amount of data for us to plunder.
In wikimarkup, a double curly bracket calls a template with arguments separate by pipes e.g. {{RA|22|29|10.26502}}
. Starbox is therefore a series of template that are formatted nice by bots for readibility. These are browser searchable simply with "template:- a number
- a number with thousands comma
- a number with European thousands point (very rare)
- the val template, e.g. gravity
- a nowrap template used with a ± symbol (incorrect, but very common)
- a number with one of different units —ouch
- a val/no wrap template with one of different units outside the template, e.g. rotation,
- a val template with one of different units within the template with the argument ul= (full list of what units mean)
- a range with a hyphen-minus or en-dash (-, –)
- Some human readable value like A: 2.2 B: 202
No comments:
Post a Comment