Monday, February 29, 2016

coordinate_music, keeping my music tidy

I wrote a set of tools to keep my local music library "coordinated", to have perfect consistency between filename, id3 tag, and Spotify's metadata.

When importing music, coordinate_music will walk through audio files and use the Spotify API to search for the associated track. This can either be done one album at a time, or on a track by track basis. It will present you with a list of candidates, then you can then confirm, or type "hear0" to hear the original, or type "hear1" to hear the first candidate. Here's what it looks like when searching by track:

Here's what it looks like when searching by album:

This association is saved in the website ID3 tag in the audio file (mp3, m4a, or flac). After importing music, this set of scripts can:
  • check that every directory and filename is formatted correctly.
  • check for consistency between filename, id3 tag, and Spotify's metadata. set tags from name and vice versa.
  • create .url files that open directly to Spotify Desktop.
  • search Spotify interactively by artist, title, album to find a corresponding Spotify track.
  • save all metadata to a utf-8 text file, which can be useful for backup.

Other features include, if enabled:

  • opening a .mp3 redirects to the associated track to play in Spotify desktop, which often has higher audio quality.
  • typing "BRK" into any interactive text prompt to view the current directory in UI and retry the current operation.
  • filenames in the format .sv.mp3 are synced to an external directory for backup.
  • working with Spotify playlists (viewing tracks, removing tracks, creating playlist from directory of mp3s).
  • saving a Spotify playlist to text file of song lengths and names.
  • indicating a song's subjective "rating" by its bitrate.
  • renaming files in a directory based on Spotify playlist.
  • saving disk space, by interactively walking through directories, and
    • if low bitrate and Spotify's 'popularity' data indicates high popularity,
    • replace the file with a .url linking to Spotify, after asking the user.
Tests pass on Linux (latest Linux Mint) and Windows (7 and later supported).

See the source code, and a more complete explanation, on GitHub.

Copying files in Python without race conditions

When copying files in Python, shutil.copy (and shutil.copy2) are able to silently overwrite the destination file if it already exists. At times this is the desired behavior, but I find that more often, I want to prevent overwriting the destination. A "naive" check would be this:
def supposedlySaferCopy(srcfile, destfile):
    if not exists(destfile):
        raise IOError('destination already exists')
    shutil.copy(srcfile, destfile)
But it has a race condition: there is a small window of time between checking for existence and running the copy. Sometimes this is check is a safeguard, for example to make sure file operations in a complex script are not overwriting data when not expected to. In general this pattern can also be a security issue, e.g. a type of symlink race.

In Windows one can make a call directly to the Windows api; both CopyFile and MoveFile take a parameter for preventing overwrite. This can be done in pure Python because ctypes is built into Python's standard library (in 2.5 and later). In Posix systems, I wrote a copyFilePosixWithoutOverwrite function. The O_CREAT flag ensures the file is new, and the O_EXCL will hold the file handle exclusively. Here are my open and copy implementations:
def copy(srcfile, destfile, overwrite):
    if not exists(srcfile):
        raise IOError('source path does not exist')
    if srcfile == destfile:
    elif sys.platform == 'win32':
        from ctypes import windll, c_wchar_p, c_int
        failIfExists = c_int(0) if overwrite else c_int(1)
        res = windll.kernel32.CopyFileW(c_wchar_p(srcfile), c_wchar_p(destfile), failIfExists)
        if not res:
            raise IOError('CopyFileW failed')
        if overwrite:
            shutil.copy(srcfile, destfile)
            copyFilePosixWithoutOverwrite(srcfile, destfile)

def move(srcfile, destfile, overwrite):
    if not exists(srcfile):
        raise IOError('source path does not exist')
    if srcfile == destfile:
    elif sys.platform == 'win32':
        from ctypes import windll, c_wchar_p, c_int
        replaceExisting = c_int(1) if overwrite else c_int(0)
        res = windll.kernel32.MoveFileExW(c_wchar_p(srcfile), c_wchar_p(destfile), replaceExisting)
        if not res:
            raise IOError('MoveFileExW failed')
        copy(srcfile, destfile, overwrite)
def copyFilePosixWithoutOverwrite(srcfile, destfile):
    # fails if destination already exist. O_EXCL prevents other files from writing to location.
    # raises OSError on failure.
    flags = os.O_CREAT | os.O_EXCL | os.O_WRONLY
    file_handle =, flags)
    with os.fdopen(file_handle, 'wb') as fdest:
        with open(srcfile, 'rb') as fsrc:
            while True:
                buffer = * 1024)
                if not buffer:
Fairly comprehensive tests and more file utilities can be found in and on my GitHub page here.

In Python 2, starting a Windows process with non-ascii characters

I recently encountered an exception in Python 2, using subprocess on Windows. If the process name or any of the arguments contain non-ascii/Unicode characters, an error like the following is raised: UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 5: ordinal not in range(128).

The issue was opened several years ago, on the official bug tracker, and fixed in Python 3 but not Python 2. It looks like the ultimate source of the issue is the use internally of CreateProcessA instead of CreateProcessW. (Some of the workarounds on this page, like specifying a code page, aren't full solutions since they'll still fail for most unicode characters).

Here's my workaround. It uses, which is MIT Licensed and available here as well as many other places on GitHub.

def runWithoutWaitUnicode(listArgs):
    # in Windows, non-ascii characters cause subprocess.Popen to fail.
    import subprocess
    if sys.platform != 'win32' or all(isinstance(arg, str) for arg in listArgs):
        p = subprocess.Popen(listArgs, shell=False)
        import winprocess
        import types
        if isinstance(listArgs, types.StringTypes):
            combinedArgs = listArgs
            combinedArgs = subprocess.list2cmdline(listArgs)
        combinedArgs = unicode(combinedArgs)
        executable = None
        close_fds = False
        creationflags = 0
        env = None
        cwd = None
        startupinfo = winprocess.STARTUPINFO()
        handle, ht, pid, tid = winprocess.CreateProcess(executable, combinedArgs,
            None, None,
            int(not close_fds),
        return pid
This only accounts for CreateProcess, and not ShellExecute (i.e. passing shell=True to subprocess). However, you can use the "start" command as a way to ShellExecute. For example, in Windows, to open a file with its default program, you can use runWithoutWaitUnicode([u'cmd', u'/c', u'start', filePath]). (As a side note, if a directory name is passed, the directory will be opened in Explorer UI, which can be useful).

For tests, including tests that specifically exercise the Unicode case that was previously broken, see and on my GitHub page here.

Thursday, February 11, 2016

Adding features to Create Synchronicity

Create Synchronicity is a lightweight open source backup and synchronization program. After choosing a source directory and a destination directory, it will send updated files from the source to the destination. It supports previewing, scheduled actions, filtering by file type, and checksum verification.

Although I use dedicated backup software, I've found Create Synchronicity useful for ad-hoc synchronization like maintaining a mirror of my music library on an external hard drive. I recently modified Create Synchronicity's source code to add some new features to make it even more useful.

Adding a Context Menu

After selecting item(s) in the Preview list, right-click to show my new context menu.
  • Show Differences...
    • Highlights differences between the files, using winmerge.exe or other diff/merge software.
  • Copy Source to Destination...
    • Selectively sync only the files that are highlighted, after showing a preview.
  • Copy Destination to Source...
    • "Reverse sync" (from destination to source) the files that are highlighted, after showing a preview.
  • Keep Source and Destination...
    • In some cases, you want to keep both the source version of the file and the destination version of the file. In order to do this, "Keep Source and Destination" appends a timestamp to the destination filename and copies the file to both locations, after showing a preview.

Additional settings

To turn on these settings, press Ctrl+Alt+E to enable "expert" features. From now on, the Settings page will show this menu in the bottom left:
  • Check for newly added contents before deleting folders
    • Time can pass between the user running Preview and Sync. New files added during this window can be potentially deleted if the parent directory is marked for deletion in the Preview. Turn on this check to eliminate the race condition.
  • Show yellow icon if destination is newer
    • When in "strict mirror" mode, show a yellow icon for files where the destination (about to be overwritten) is more recent than the source.
  • Potential speedup when MD5 and compare file size are enabled
    • Reordered code to reduce the number of checksums needed.
  • Tests
    • Low level tests cover every branch of newly added functions, every combination of file/folder, create/update/delete. Component tests write to a temp directory and verify all directories, file contents written as expected.
Download link and source code coming soon!

Wednesday, January 20, 2016

A Simple Interface to Read/Write Audio Metadata in Python

I wrote a small wrapper for Mutagen that makes it easier to read/write audio metadata (tags for mp3, ogg, flac, m4a/mp4) in Python. Here's an example:
    o = EasyPythonMutagen('file.mp3')
    o.set('title', 'song title')
    o = EasyPythonMutagen('file.flac')
    o.set('title', 'song title')
    o = EasyPythonMutagen('file in id3_v23.mp3', use_id3_v23=True)
    o.set('title', u'title with unicode: \u0107')

A few differences from Mutagen:
  • You can use the same class and interface for different audio formats.
  • You won't need to catch exceptions in case the mp3 doesn't have an id3 tag yet.
  • You won't have to use a low level interface to write tags in id3v2.3, for compat. with Windows and smartphone apps.

It'd be nice to add id3v2.3 support in EasyID3 to the mutagen project at some point. In the meantime I'll use this wrapper.

See the source and download it on GitHub.

Other small features of easypythonmutagen:

  • Provides method to get the empirical ("actual") bitrate in addition to stated bitrate.
  • The "get" methods directly return a value, instead of a list.
  • Intentionally disallows adding unrecognized fields A typo like o['aartist'] fails instead of succeeding silently.
  • Added a few fields, like 'Composer' and 'Website' for mp4/m4a.

Saturday, May 2, 2015

How to write a program using Skia on Windows

Skia is an open source 2D graphics library which provides common APIs that work across a variety of hardware and software platforms. It serves as the graphics engine for Google Chrome and Chrome OS, Android, Mozilla Firefox and Firefox OS, and many other products. Skia is an alternative to the Cairo library.

Posting this in case it helps anyone else.

Visual Studio 2013 (including the express or community editions, which are free)
Unzipping tool like 7zip, WinRAR

The command prompt lines below should be run in the same session (i.e. it won't work if you close and reopen a new command prompt).

  • Download from the Install Depot Tools page
  • Use 7zip or WinRAR to Extract All to a path like c:\path\to\depot_tools (no spaces in path). The Windows built-in unzip ight skip hidden files.
  • Open a command prompt
  • Run "cd c:\path\to\depot_tools"
  • Run "echo %PATH%"
  • In the output, if you already have Python installed and see a Python directory, you might want to remove this from the path. set PATH=x can do this for just this command session.
  • In the output, if you already have Git installed and see a Git directory, you might want to remove this from the path. set PATH=x can do this for just this command session.
  • Run "set PATH=%PATH%;c:\path\to\depot_tools" to add depot tools to the path
  • Run "gclient". This will download and sync the needed tools.
  • Make a directory like c:\path\to\skia (no spaces in path)
  • In the same command prompt Run "cd c:\path\to\skia"
  • Run git config --global "Your Name"
  • Run git config --global
  • mkdir skia
  • cd skia
  • gclient config --name . --unmanaged
  • gclient sync
  • git checkout master
  • Run "set GYP_GENERATORS=msvs"
  • Run "python gyp_skia"
  • Run "ren out out86"
  • Run "python gyp_skia -D skia_arch_width=64"
  • Run "ren out out64"
  • Open .\out86\skia.sln in Visual Studio
  • For me, I only needed to build Release
  • For me, I didn't need these projects, and also these failed to build as they couldn't find QT. Open Configuration Manager, under the Debug/Release drop down, uncheck Build for the following debugger, debugger_qt_mocs, pdfviewer, pdfviewer_lib
  • Hit Build Solution, and wait several minutes
  • When the build is done, you may see some compilation warnings/errors but if the default project HelloWorld runs correctly, (Ctrl+F5), it's likely that all of the important parts work.
  • Open .\out64\skia.sln in VS
  • Repeat the above steps for x64.
Now, to create an example project that doesn't need Google's gyp system:
  • Open Visual Studio and create a new project. Other languages > Visual C++ > Win32 > Win32 Console Application
  • In the Win32 Application Wizard, click Application Settings, uncheck Precompiled Header, check Empty Project.
  • Switch from Debug to Release
  • Go into the project's options, Configuration Properties > C/C++ > General > Additional Include Diretories and add: c:\path\to\skia\include\core;c:\path\to\skia\include\config
  • Go into the project's options, Configuration Properties > C/C++ > Preprocessor > Preprocessor Definitions and add:
  • Go into the project's options, Configuration Properties > Linker > Input > Additional Dependencies and add (preferably as relative paths)
Then, add a main.cpp to the project, with the following code,
#include <string>
#include <fstream>

#include "SkCanvas.h"
#include "SkData.h"
#include "SkDocument.h"
#include "SkGraphics.h"
#include "SkSurface.h"
#include "SkImage.h"
#include "SkStream.h"
#include "SkString.h"

#include "..\effects\SkGradientShader.h"

void save_ppm(SkBitmap const& bitmap, std::string const& filename)
  SkAutoLockPixels l(bitmap);

  std::ofstream ofile(filename.c_str(), std::ios_base::binary | std::ios_base::trunc);
  if (ofile.is_open())
    ofile << "P6 " << bitmap.width() << " " << bitmap.height() << " 255 ";

    for (int i = 0; i != bitmap.height(); i++)
      for (int j = 0; j != bitmap.width(); j++)
        SkColor const* c = bitmap.getAddr32(j, i);
        char buf[3] = { SkColorGetR(*c), SkColorGetG(*c), SkColorGetB(*c) };
        ofile.write(buf, 3);

void TestSkia(SkCanvas& canvas)
  SkPaint paint;
  SkRect rect = {
    20, 20,
    50, 50
  canvas.drawRect(rect, paint);

int main(int argc, char * const argv[])
  SkAutoGraphics ag;
  SkBitmap bitmap;
  int width = 800;
  int height = 600;
  bitmap.allocPixels(SkImageInfo::MakeN32Premul(width, height));
  SkCanvas canvas(bitmap);


  save_ppm(bitmap, "out.ppm");
  return 0;

// stub out openGl dependency, which isn't needed in this case.
extern "C"
#ifdef _WIN64
  PROC WINAPI __imp_wglGetProcAddress(LPCSTR)
    return nullptr;
  HGLRC WINAPI  __imp_wglGetCurrentContext()
    return nullptr;
  PROC WINAPI _imp__wglGetProcAddress(LPCSTR)
    return nullptr;

  HGLRC WINAPI _imp__wglGetCurrentContext()
    return nullptr;

Running this little program will create a valid ppm file with a red rectangle!

To build for x64, you can create a new x64 target and update the lib directories from c:\path\to\skia\out86 to c:\path\to\skia\out64.

To add codecs for saving to different image types:
  • In Linker Inputs, add a reference to skia_codecs.lib
  • Add #include "..\images\SkForceLinking.h"

To add OpenGL:
  • remove the __imp_wglGetProcAddress and __imp_wglGetCurrentContext stubs
  • In Linker Inputs, add references to the following:

Install Depot Tools
Skia Quick Start Guides Windows

Tuesday, April 7, 2015

Copying files out of a VM guest machine

A nice benefit of using a guest VM is that the host machine is protected from any malware that infects the guest (barring security vulnerabilities in the VM software itself). I've been using VMs fairly frequently over the past five years, first with VMWare, and now with VirtualBox.

If the guest machine is possibily affected by malware, how then can one transfer data from the guest to the host? Sending through e-mail divulges password information, and uploading to some type of file transfer site is slow and inconvenient. Running a ftp server or web server on the host takes time and introduces another attack surface. Using VirtualBox's shared clipboard works for text files and I was able to use it for binary files after escaping characters, but is also inconvenient and less sure to be safe. VirtualBox's default way of transfering files, emulating a SMB network drive, is not safe, as malware can propagate across a network drive.

I'll describe the approach I came up with. I do use bridged networking so that the guest can ping the host, but I disable all of VirtualBox's shared folders/network drives/USB connectivity. I then make sure that no shared folders on the host are publically writable. I install Python on the guest and use scripts to transfer files over a socket by ip address. (To see the guest's ip, in Windows ipconfig, in Linux ifconfig).

First run this script on the host, which I put together from some stack overflow answers,
import socket

f = open('output_file', 'wb')
conn = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
portnumber = 8206
conn.bind(('', portnumber))
channel, details = conn.accept()
print 'connected'
while True:
  received_data = channel.recv(4096)
  if not received_data:
print "transfer complete!"

Run this script on the guest, after changing the file name and ip address; I haven't found a need for the script to support files that won't fit into memory.
import socket

file_to_send = './filename'
ip_of_recipient = ''
portnumber = 8206
f = open(file_to_send, 'rb')
all_file_contents =

conn = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
conn.connect((ip_of_recipient, portnumber))

To ensure that the data is intact, I can use a quick checksum with SHA512,
import hashlib

hash = hashlib.sha512()
while True:
  # update the hash 256k at a time
  buf = * 256)
  if not buf: break
  hash.update (buf)

print hash.hexdigest()

The chances of malware are now much lower. Only one file can come through a port that is quickly closed. I don't use this to transfer executable files, though, as they can have been modified, but in general it seems to be a safer way to copy files from a VM.