Methods

activegit defines a single class with methods and properties that wrap git features, such as tags and push/pull. Wrapping these features allows them to be cast to an active learning context.

class activegit.ActiveGit(repopath, bare=False, shared='group')

Uses a git repo to keep track of active learning data and classifier.

The standard set of files is: ‘training.pkl’, ‘testing.pkl’, and ‘classifier.pkl’. First two each contain a dictionary with features as keys and target labels (e.g., 0/1) as values. The third file contains the classifier (e.g., from sklearn).

Tags are central to tracking classifier and data. A new repo starts with empty files and a tag “initial”. Branch ‘master’ keeps latest and branch ‘working’ is used for active session. After committing a new version, the working is merged to master, deleted, and a new working branch checked out.

Setting bare=True creates a bare git repo that can be shared (cloned) by a group locally or via git daemon sharing.

classifier

Returns classifier from classifier.pkl

commit_version(version, msg=None)

Add tag, commit, and push changes

initializerepo()

Fill empty directory with products and make first commit

isvalid

Checks whether contents of repo are consistent with standard set.

set_version(version, force=True)

Sets the version name for the current state of repo

show_version_info(version)

Summarizes info of a particular version (a la “git show version”)

testing_data

Returns data dictionary from testing.pkl

training_data

Returns data dictionary from training.pkl

update()

Pull latest versions/tags, if linked to a remote (e.g., github).

version

Current version checked out.

versions

Sorted list of versions committed thus far.

write_classifier(clf)

Writes classifier object to pickle file

write_testing_data(features, targets)

Writes data dictionary to filename

write_training_data(features, targets)

Writes data dictionary to filename