Learning Feature Representations for Audio-Visual Tasks - Meru Sandbox