-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathkcluster_afromb.py
239 lines (192 loc) · 8.21 KB
/
kcluster_afromb.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
"""
kcluster_afromb.py
2013 CJ Carr, Zack Zukowski
cortexel.us
http://github.com/cortexelus/dadabots
An improved version of Ben Lacker's afromb.py.
Uses k-means clustering on timbre data.
For each segment in a:
Puts it into one of k groups
Matches that group with a group in b
Picks a new segment from b's group (randomly, the best match, or from a set of best matches)
In this way, the diversity of timbre is preserved.
(afromb.py takes every segment in a and finds the closest segment in b---but this process doesn't preserve the diversity of b)
Which makes it possible to take simple beats (kick snare hithat) and layer an enslaught of samples ontop of them while preserving the ebb and flow of the rhythm.
For example, as this song: http://soundcloud.com/cortexelus/23-mindsplosion-algorithmic
#############################
USAGE:
python kcluster_afromb.py INPUT1 INPUT2 OUTPUT MIX NUMCLUSTERS BESTMATCH
EXAMPLES:
python kcluster_afromb.py song_structure.mp3 sample_library.mp3 remix.mp3 0.8 10 2
# Setting NUMCLUSTERS to 1 and BESTMATCH to 1 is essentially the same thing as running the original afromb.py
python kcluster_afromb.py a.mp3 b.mp3 out.mp3 0.5 1 1
# To layer an enslaught of samples on a simple beat, set k to around 3-6 and set bestmatch to 0. Sample enslaught!!!!!!!!
python kcluster_afromb.py drumbreaks.mp3 synthlib.mp3 out.mp3 0.5 4 0
#############################
INPUT1: soundfile, the song's structure
INPUT2: soundfile, where the samples are coming from
OUTPUT: output file, sounds like INPUT2 remixed to fit the structure of INPUT1
MIX The volume mix between remixed INPUT2 and the original INPUT1.
0 only the original INPUT1
1.0 only the remixed INPUT2
0.5 half of each
NUMCLUSTERS: the k in k-means clustering.
There are usually about 500-1500 segments in a song.
We groups those segments into K groups.
k = 1 is the same as not running k-means clustering at all.
I usually get good results using something between k = 3 (kick, snare, hihat) and k = 20, but you should really just experiment to find out, because it depends on your source files and what effect you want.
BESTMATCH: Which segment do we pick from b's group?
0 random
1 pick the best match
2 pick randomly from the best 2 matches
n pick randomly from the best n matches
#############################
TODO:
* How do we fit b's segments to a's structure? Right now we add silence. TODO: add option to stretch audio.
* Option to set K as a fraction of total segments.
<3\m/
"""
from numpy import vstack,array
from numpy.random import rand
from numpy.linalg import norm
from scipy.cluster.vq import kmeans,vq
from random import choice
import sys
import echonest.audio as audio
from echonest.sorting import *
from echonest.selection import *
import pyechonest.config as config
config.MP3_BITRATE = 192 # take this line out if you want to use default bitrate
inputFilename = sys.argv[1]
inputFilename2 = sys.argv[2]
outputFilename = sys.argv[3]
# how many different groups will we cluster our data into?
mix = float(sys.argv[4])
# how many different groups will we cluster our data into?
num_clusters = int(sys.argv[5])
# best_match = 1 # slower, less varied version. Good for b's which are percussion loops
# best_match = 0 # faster, more varied version, picks a random segment from that cluster. Good for b's which are sample salads.
best_match = int(sys.argv[6])
# analyze the songs
song = audio.LocalAudioFile(inputFilename)
song2 = audio.LocalAudioFile(inputFilename2)
# build a blank output song, to populate later
sample_rate = song.sampleRate
num_channels = song.numChannels
out_shape = list(song.data.shape)
out_shape[0] = 2
out = audio.AudioData(shape=out_shape, sampleRate=sample_rate,numChannels=num_channels)
# grab timbre data
# must be converted to a numpy.array() so that kmeans(data, n) is happy
data = array(song.analysis.segments.timbre)
data2 = array(song2.analysis.segments.timbre)
# computing K-Means with k = num_clusters
centroids,_ = kmeans(data,num_clusters)
centroids2,_ = kmeans(data2,num_clusters)
# assign each sample to a cluster
idx,_ = vq(data,centroids)
idx2,_ = vq(data2,centroids2)
#idx lists the cluster that each data point belongs to
# ex. (k=2) [2, 0, 0, 1, 0, 2, 0, 0, 1, 2]
# How to pair up clusters?
# I think a largest-first greedy algorithm will work.
# 1) Find largest cluster A[c] in A
# 2) Find closest cluster in B from A[c]
# 3) Pair them. Remove from data.
# 4) Continue until everything is paired.
# first create a collection, and then sort it
# not using python's collection, because of python2.6 compatability
collection = []
for c in range(0, num_clusters):
ccount = 0
for i in idx:
if i==c:
ccount += 1
collection.append([ccount, c])
collection.sort()
# list of cluster indices from largest to smallest
centroid_pairs = []
for _,c in collection:
centroid1 = array(centroids[c])
min_distance = [9999999999,0]
for ci in range(0,len(centroids2)):
if ci in [li[1] for li in centroid_pairs]:
continue
centroid2 = array(centroids2[ci])
euclidian_distance = norm(centroid1-centroid2)
if euclidian_distance < min_distance[0]:
min_distance = [euclidian_distance, ci]
centroid_pairs.append([c,min_distance[1]])
print centroid_pairs
# now we have a list of paired up cluster indices. Cool.
# Just so we're clear, we're rebuilding the structure of song1 with segments from song2
# prepare song2 clusters,
segclusters2 = [audio.AudioQuantumList()]*len(centroids2)
for s2 in range(0,len(idx2)):
segment2 = song2.analysis.segments[s2]
cluster2 = idx2[s2]
segment2.numpytimbre = array(segment2.timbre)
segclusters2[cluster2].append(segment2)
# for each segment1 in song1, find the timbrely closest segment2 in song2 belonging to the cluster2 with which segment1's cluster1 is paired.
for s in range(0,len(idx)):
segment1 = song.analysis.segments[s]
cluster1 = idx[s]
cluster2 = [li[1] for li in centroid_pairs if li[0]==cluster1][0]
if(best_match>0):
# slower, less varied version. Good for b's which are percussion loops
"""
# there's already a function for this, use that instead: timbre_distance_from
timbre1 = array(segment1.timbre)
min_distance = [9999999999999,0]
for seg in segclusters2[cluster2]:
timbre2 = seg.numpytimbre
euclidian_distance = norm(timbre2-timbre1)
if euclidian_distance < min_distance[0]:
min_distance = [euclidian_distance, seg]
bestmatchsegment2 = min_distance[1]
# we found the segment2 in song2 that best matches segment1
"""
bestmatches = segclusters2[cluster2].ordered_by(timbre_distance_from(segment1))
if(best_match > 1):
# if best_match > 1, it randomly grabs from the top best_matches.
maxmatches = max(best_match, len(bestmatches))
bestmatchsegment2 = choice(bestmatches[0:maxmatches])
else:
# if best_match == 1, it grabs the exact best match
bestmatchsegment2 = bestmatches[0]
else:
# faster, more varied version, picks a random segment from that cluster. Good for sample salads.
bestmatchsegment2 = choice(segclusters2[cluster2])
reference_data = song[segment1]
segment_data = song2[bestmatchsegment2]
# what to do when segments lengths aren't equal? (almost always)
# do we add silence? or do we stretch the samples?
add_silence = True
# This is the add silence solution:
if add_silence:
if reference_data.endindex > segment_data.endindex:
# we need to add silence, because segment1 is longer
if num_channels > 1:
silence_shape = (reference_data.endindex,num_channels)
else:
silence_shape = (reference_data.endindex,)
new_segment = audio.AudioData(shape=silence_shape,
sampleRate=out.sampleRate,
numChannels=segment_data.numChannels)
new_segment.append(segment_data)
new_segment.endindex = len(new_segment)
segment_data = new_segment
elif reference_data.endindex < segment_data.endindex:
# we need to cut segment2 shorter, because segment2 is shorter
index = slice(0, int(reference_data.endindex), 1)
segment_data = audio.AudioData(None, segment_data.data[index], sampleRate=segment_data.sampleRate)
else:
# TODO: stretch samples to fit.
# haven't written this part yet.
segment_data = segment_data
# mix the original and the remix
mixed_data = audio.mix(segment_data,reference_data,mix=mix)
out.append(mixed_data)
# redner output
out.encode(outputFilename)