Wednesday, 13 June 2012

epershand: "Python: programming the way Guido indented it" (python)
You know, I kvetch about the AO3 a lot, and their coding team has been doing a lot of hustling lately without getting a lot of love but damn. Sometimes I am just hit by how fucking RIGHT they've done something.

For example: right now I'm helping out [personal profile] starlady with a fandom studies project, by writing her a script that looks at fanfiction html and extracts fandom, ship, publication date, etc. I'm writing dedicated parsers for a few major fic archives.

This is (roughly speaking) what my code looks like for the AO3:

def GetAo3Metadata(self):
"""Extract metadata from Archive of Our Own Beautiful Soup object."""
self.metadata.author = # Find the "a" tag with the class "login author"
self.metadata.title = # Find the "h2" tag with the class title heading"
self.metadata.rating = # Get all items from the list with the class "rating tags"
etc.


This is, roughly speaking, what the code looks like for everything else:

def ParseFanfictionNetMetadata(self):
"""Extract metadata from Fanfiction.net Beautiful Soup object."""
# Find the block called "gui_table1" because, you know, that's meaningful.
# Fuck it, just extract all the text from that block.
# And then do a regular expression search.
# And then take a shot.


Or like this:

def ParseYuletideTreaureMetadata(self):
"""Extract metadata from Yuletidetreasure.org Beautiful Soup object."""
# Fuck is this the nineties? Are there really NO DIVS in this code?
# Or class attributes?
# Or even fucking paragraph blocks?
# Fuck it, I'm drinking.

Profile

epershand: An ampersand (Default)
epershand

July 2014

M T W T F S S
 123456
78910111213
14151617181920
2122232425 2627
28293031   

Expand Cut Tags

No cut tags

Style Credit