epershand: "Python: programming the way Guido indented it" (python)
epershand ([personal profile] epershand) wrote2012-06-13 11:48 am

A brief love note to the AO3

You know, I kvetch about the AO3 a lot, and their coding team has been doing a lot of hustling lately without getting a lot of love but damn. Sometimes I am just hit by how fucking RIGHT they've done something.

For example: right now I'm helping out [personal profile] starlady with a fandom studies project, by writing her a script that looks at fanfiction html and extracts fandom, ship, publication date, etc. I'm writing dedicated parsers for a few major fic archives.

This is (roughly speaking) what my code looks like for the AO3:

def GetAo3Metadata(self):
"""Extract metadata from Archive of Our Own Beautiful Soup object."""
self.metadata.author = # Find the "a" tag with the class "login author"
self.metadata.title = # Find the "h2" tag with the class title heading"
self.metadata.rating = # Get all items from the list with the class "rating tags"

This is, roughly speaking, what the code looks like for everything else:

def ParseFanfictionNetMetadata(self):
"""Extract metadata from Fanfiction.net Beautiful Soup object."""
# Find the block called "gui_table1" because, you know, that's meaningful.
# Fuck it, just extract all the text from that block.
# And then do a regular expression search.
# And then take a shot.

Or like this:

def ParseYuletideTreaureMetadata(self):
"""Extract metadata from Yuletidetreasure.org Beautiful Soup object."""
# Fuck is this the nineties? Are there really NO DIVS in this code?
# Or class attributes?
# Or even fucking paragraph blocks?
# Fuck it, I'm drinking.

Post a comment in response:

Identity URL: 
Account name:
If you don't have an account you can create one now.
HTML doesn't work in the subject.


Notice: This account is set to log the IP addresses of everyone who comments.
Links will be displayed as unclickable URLs to help prevent spam.