Using steve - library

By day, steve is a cli of world renoun. By night, steve is a Python library capable of great cunning. This chapter covers the utility functions.

Writing steve scripts

steve can be used for batch processing a bunch of JSON files.

Most batch processing works this way:

  1. get the config file (steve.util.get_project_config())
  2. get all the json files (steve.util.load_json_files())
  3. iterate through the json files transforming the data (Python for loop)
  4. save the json files (steve.util.save_json_files())

steve.util

steve.util.with_config(fun)

Decorator that passes config as first argument

Raises:ConfigNotFound – if the config file can’t be found

This calls get_project_config(). If that returns a configuration object, then this passes that as the first argument to the decorated function. If get_project_config() doesn’t return a config object, then this raises ConfigNotFound.

Example:

>>> @with_config
... def config_printer(cfg):
...     print 'Config!: {0!r}'.format(cfg)
...
>>> config_printer()  # if it found a config
Config! ...
>>> config_printer()  # if it didn't find a config
Traceback
    ...
steve.util.ConfigNotFound: steve.ini could not be found.
steve.util.get_project_config()

Finds and opens the config file in the current directory

Raises:ConfigNotFound – if the config file can’t be found
Returns:config file
steve.util.html_to_markdown(text)

Converts an HTML string to equivalent Markdown

Parameters:text – the HTML string to convert
Returns:Markdown string

Example:

>>> html_to_markdown('<p>this is <b>html</b>!</p>')
u'this is **html**!'
steve.util.load_json_files(config)

Parses and returns all video files for a project

Parameters:config – the configuration object
Returns:list of (filename, data) tuples where filename is the string for the json file and data is a Python dict of metadata.
steve.util.save_json_files(config, data, **kw)

Saves a bunch of files to json format

Parameters:
  • config – the configuration object
  • data – list of (filename, data) tuples where filename is the string for the json file and data is a Python dict of metadata

Note

This is the save side of load_json_files(). The output of that function is the data argument for this one.

steve.util.save_json_file(config, filename, contents, **kw)

Saves a single json file

Parameters:
  • config – configuration object
  • filename – filename
  • contents – python dict to save
  • kw – any keyword arguments accepted by json.dump
steve.util.scrapevideo(video_url)

Scrapes the url and fixes the data

Parameters:video_url – Url of video to scrape.
Returns:Python dict of metadata

Example:

>>> scrapevideo('http://www.youtube.com/watch?v=ywToByBkOTc')
{'url': 'http://www.youtube.com/watch?v=ywToByBkOTc', ...}
steve.util.verify_video_data(data)

Verify the data in a single json file for a video.

Parameters:
  • data – The parsed contents of a JSON file. This should be a Python dict.
  • category

    The category as specified in the steve.ini file.

    If the steve.ini has a category, then every data file either has to have the same category or no category at all.

    This is None if no category is specified in which case every data file has to have a category.

Returns:

list of error strings.

steve.util.verify_json_files(json_files)

Verifies the data in a bunch of json files.

Prints the output

Parameters:
  • json_files – list of (filename, parsed json data) tuples to call verify_video_data() on
  • category

    The category as specified in the steve.ini file.

    If the steve.ini has a category, then every data file either has to have the same category or no category at all.

    This is None if no category is specified in which case every data file has to have a category.

Returns:

dict mapping filenames to list of error strings

Recipes

Here’s some sample code for doing batch transforms. Each script should be located in the project directory root next to the steve.ini file. Make sure the steve package is installed and then run the script with the python interpreter:

python name_of_my_script.py

Or however you want to structure and/or run it.

Update language

This fixes the language property in each json file. It sets it to “Italian” if the word “Italiana” appears in the summary. Otherwise it sets it to “English”.

import steve.util

cfg = steve.util.get_project_config()
data = steve.util.load_json_files(cfg)

for fn, contents in data:
    print fn

    # If 'Italiana' shows up in the summary, set the language
    # to Italian.
    if 'Italiana' in contents['summary']:
        contents['language'] = u'Italian'
    else:
        contents['language'] = u'English'

steve.util.save_json_files(cfg, data)

Move speaker from summary to speakers

This removes the first line of the summary and puts it in the speakers field.

import steve.util

cfg = steve.util.get_project_config()
data = steve.util.load_json_files(cfg)

for fn, contents in data:
    print fn

    # If the data already has speakers, then we assume we've already
    # operated on it and don't operate on it again.
    if contents['speakers']:
        continue

    summary = contents['summary']
    summary = summary.split('\n')

    # The speakers field is a list of strings. So we remove the first
    # line of the summary, strip the whitespace from it, and put that
    # in the speakers field.
    # (NB: This bombs out if the summary field is empty.)
    contents['speakers'].append(summary.pop(0).strip())

    # Put the rest of the summary back.
    contents['summary'] = '\n'.join(summary)

steve.util.save_json_files(cfg, data)

Convert summary and description to Markdown

This converts summary and description to Markdown.

import steve.util

cfg = steve.util.get_project_config()
data = steve.util.load_json_files(cfg)

for fn, contents in data:
    print fn

    contents['summary'] = steve.util.html_to_markdown(
        contents.get('summary', ''))

    contents['description'] = steve.util.html_to_markdown(
        contents.get('description', ''))

steve.util.save_json_files(cfg, data)