c - CUDA: How do I use float audio data with cuFFT? -


i'm interested in transforming audio signal in cufft data necessary create spectrogram. seem losing of audio data when trying convert float cufftreal before transform. also, don't think actual approach correct getting correct result. here have far:

void process_data(float *h_in_data_dynamic, sf_count_t samples, int channels) { int nsamples = (int)samples; int datasize = 512; int batch = nsamples / datasize;  cuffthandle plan;  //this makes data become 0's. cufftreal *d_in_data; cudamalloc((void**)&d_in_data, sizeof(cufftreal) * nsamples); cudamemcpy(d_in_data, (cufftreal*)h_in_data_dynamic, sizeof(cufftreal) * nsamples, cudamemcpyhosttodevice);   cufftcomplex *data; cudamalloc((void**)&data, sizeof(cufftcomplex) * nsamples);   cufftcomplex *hostoutputdata = (cufftcomplex*)malloc((datasize / 2 + 1) * batch * sizeof(cufftcomplex));  if (cudagetlasterror() != cudasuccess) {     fprintf(stderr, "cuda error: failed allocate\n");     return; }  int rank = 1;                           // --- 1d ffts int n[] = { datasize };                 // --- size of fourier transform int istride = 1, ostride = 1;           // --- distance between 2 successive input/output elements int idist = datasize, odist = (datasize / 2) + 1; // --- distance between batches int inembed[] = { 0 };                  // --- input size pitch (ignored 1d transforms) int onembed[] = { 0 };                  // --- output size pitch (ignored 1d transforms)  cufftplanmany(&plan, rank, n,               inembed, istride, idist,               onembed, ostride, odist, cufft_r2c, batch);  /* use cufft plan transform signal in place. */ if (cufftexecr2c(plan, d_in_data, data) != cufft_success) {     fprintf(stderr, "cufft error: execc2c forward failed");     return; }  cudamemcpy(hostoutputdata, data, (datasize / 2) + 1 * batch * sizeof(cufftcomplex), cudamemcpydevicetohost);  (int i=0; < batch; i++)     (int j=0; j < (datasize / 2 + 1); j++)         printf("%i %i %f %f\n", i, j, hostoutputdata[i*(datasize / 2 + 1) + j].x, hostoutputdata[i*(datasize / 2 + 1) + j].y);  cufftdestroy(plan); cudafree(data); cudafree(d_in_data); } 

there few issues can see.

  1. you should indent code readability.
  2. any time you're having trouble, proper error checking. it's nitpick, didn't check return code of call cufftplanmany. aren't doing proper error checking on last cudamemcpy call.
  3. the sizes of these 2 allocations should match. don't:

    cudamalloc((void**)&data, sizeof(cufftcomplex) * nsamples);  cufftcomplex *hostoutputdata = (cufftcomplex*)malloc((datasize / 2 + 1) * batch * sizeof(cufftcomplex)); 

    the size of second allocation above correct one, , should duplicated first one.

  4. you have basic typo in line. should have parenthesis have indicated:

    cudamemcpy(hostoutputdata, data, (datasize / 2) + 1 * batch * sizeof(cufftcomplex), cudamemcpydevicetohost);                                 ^                  ^ 
  5. so expects when seeking debugging help, provide mcve. it's not responsibility of others create main routine you, , synthesize data, , guess @ headers including , sf_count_t is, , trying accomplish generally.

  6. your routine not taking account channels. likewise have not either, since not issue here. use of multi-channel data have impact on code, depending on data layout.

when fix above issues, makes sense me.

$ cat t621.cu #include <cufft.h> #include <math.h> #include <stdio.h>  #define fftsize 512 #define debug 0  typedef size_t sf_count_t;  void process_data(float *h_in_data_dynamic, sf_count_t samples, int channels) {   int nsamples = (int)samples;   int datasize = fftsize;   int batch = nsamples / datasize;    cuffthandle plan;    cufftreal *d_in_data;   cudamalloc((void**)&d_in_data, sizeof(cufftreal) * nsamples);   cudamemcpy(d_in_data, (cufftreal*)h_in_data_dynamic, sizeof(cufftreal) * nsamples, cudamemcpyhosttodevice);    cufftcomplex *data;   cudamalloc((void**)&data, sizeof(cufftcomplex) * batch * (datasize/2 + 1));    cufftcomplex *hostoutputdata = (cufftcomplex*)malloc((datasize / 2 + 1) * batch * sizeof(cufftcomplex));    if (cudagetlasterror() != cudasuccess) {     fprintf(stderr, "cuda error: failed allocate\n");     return;   }    int rank = 1;                           // --- 1d ffts   int n[] = { datasize };                 // --- size of fourier transform   int istride = 1, ostride = 1;           // --- distance between 2 successive input/output elements   int idist = datasize, odist = (datasize / 2) + 1; // --- distance between batches   int inembed[] = { 0 };                  // --- input size pitch (ignored 1d transforms)   int onembed[] = { 0 };                  // --- output size pitch (ignored 1d transforms)    if(cufftplanmany(&plan, rank, n,               inembed, istride, idist,               onembed, ostride, odist, cufft_r2c, batch) != cufft_success){     fprintf(stderr, "cufft error: plan failed");     return;   }  /* use cufft plan transform signal in place. */   if (cufftexecr2c(plan, d_in_data, data) != cufft_success) {     fprintf(stderr, "cufft error: execr2c forward failed");     return;   }    cudamemcpy(hostoutputdata, data, ((datasize / 2) + 1) * batch * sizeof(cufftcomplex), cudamemcpydevicetohost);   if (cudagetlasterror() != cudasuccess) {     fprintf(stderr, "cuda error: failed results copy\n");     return;   }    float *spectrum = (float *)malloc((datasize/2)*sizeof(float));   (int j = 0; j < (datasize/2); j++) spectrum[j] = 0.0f;   (int i=0; < batch; i++)     (int j=0; j < (datasize / 2 + 1); j++){ #if debug         printf("%i %i %f %f\n", i, j, hostoutputdata[i*(datasize / 2 + 1) + j].x, hostoutputdata[i*(datasize / 2 + 1) + j].y); #endif         // compute spectral magnitude         // note cufft induces scale factor of fftsize         if (j < (datasize/2)) spectrum[j] += sqrt(pow(hostoutputdata[i*(datasize/2 +1) +j].x, 2) + pow(hostoutputdata[i*(datasize/2 +1) +j].y, 2))/(float)(batch*datasize);         }   //assumes fs half of fftsize, or pass fs separately   printf("spectrum\n hz:   magnitude:\n");   (int j = 0; j < (datasize/2); j++) printf("%.3f %.3f\n", j/2.0f, spectrum[j]);    cufftdestroy(plan);   cudafree(data);   cudafree(d_in_data); }  int main(){    const int nsets = 20;   const float sampling_rate = fftsize/2;   const float amplitude = 1.0;   const float fc1 = 6.0;   const float fc2 = 4.5;   float *my_data;    my_data = (float *)malloc(nsets*fftsize*sizeof(float));   //generate synthetic data mix of 2 sine waves @ fc1 , fc2 hz   (int = 0; < nsets*fftsize; i++)     my_data[i] = amplitude*sin(fc1*(6.283/sampling_rate)*i)                + amplitude*sin(fc2*(6.283/sampling_rate)*i);    process_data(my_data, nsets*fftsize, 1);   return 0; }   $ nvcc -arch=sm_20 -o t621 t621.cu -lcufft $ ./t621  hz:   magnitude: 0.000 0.000 0.500 0.000 1.000 0.000 1.500 0.000 2.000 0.000 2.500 0.000 3.000 0.000 3.500 0.000 4.000 0.000 4.500 0.500 5.000 0.000 5.500 0.000 6.000 0.500 6.500 0.000 7.000 0.000 7.500 0.000 8.000 0.000 8.500 0.000 9.000 0.000 9.500 0.000 10.000 0.000 10.500 0.000 11.000 0.000 11.500 0.000 12.000 0.000 12.500 0.000 13.000 0.000 13.500 0.000 14.000 0.000 14.500 0.000 15.000 0.000 15.500 0.000 16.000 0.000 16.500 0.000 17.000 0.000 17.500 0.000 18.000 0.000 18.500 0.000 19.000 0.000 19.500 0.000 20.000 0.000 20.500 0.000 21.000 0.000 21.500 0.000 22.000 0.000 22.500 0.000 23.000 0.000 23.500 0.000 24.000 0.000 24.500 0.000 25.000 0.000 25.500 0.000 26.000 0.000 26.500 0.000 27.000 0.000 27.500 0.000 28.000 0.000 28.500 0.000 29.000 0.000 29.500 0.000 30.000 0.000 30.500 0.000 31.000 0.000 31.500 0.000 32.000 0.000 32.500 0.000 33.000 0.000 33.500 0.000 34.000 0.000 34.500 0.000 35.000 0.000 35.500 0.000 36.000 0.000 36.500 0.000 37.000 0.000 37.500 0.000 38.000 0.000 38.500 0.000 39.000 0.000 39.500 0.000 40.000 0.000 40.500 0.000 41.000 0.000 41.500 0.000 42.000 0.000 42.500 0.000 43.000 0.000 43.500 0.000 44.000 0.000 44.500 0.000 45.000 0.000 45.500 0.000 46.000 0.000 46.500 0.000 47.000 0.000 47.500 0.000 48.000 0.000 48.500 0.000 49.000 0.000 49.500 0.000 50.000 0.000 50.500 0.000 51.000 0.000 51.500 0.000 52.000 0.000 52.500 0.000 53.000 0.000 53.500 0.000 54.000 0.000 54.500 0.000 55.000 0.000 55.500 0.000 56.000 0.000 56.500 0.000 57.000 0.000 57.500 0.000 58.000 0.000 58.500 0.000 59.000 0.000 59.500 0.000 60.000 0.000 60.500 0.000 61.000 0.000 61.500 0.000 62.000 0.000 62.500 0.000 63.000 0.000 63.500 0.000 64.000 0.000 64.500 0.000 65.000 0.000 65.500 0.000 66.000 0.000 66.500 0.000 67.000 0.000 67.500 0.000 68.000 0.000 68.500 0.000 69.000 0.000 69.500 0.000 70.000 0.000 70.500 0.000 71.000 0.000 71.500 0.000 72.000 0.000 72.500 0.000 73.000 0.000 73.500 0.000 74.000 0.000 74.500 0.000 75.000 0.000 75.500 0.000 76.000 0.000 76.500 0.000 77.000 0.000 77.500 0.000 78.000 0.000 78.500 0.000 79.000 0.000 79.500 0.000 80.000 0.000 80.500 0.000 81.000 0.000 81.500 0.000 82.000 0.000 82.500 0.000 83.000 0.000 83.500 0.000 84.000 0.000 84.500 0.000 85.000 0.000 85.500 0.000 86.000 0.000 86.500 0.000 87.000 0.000 87.500 0.000 88.000 0.000 88.500 0.000 89.000 0.000 89.500 0.000 90.000 0.000 90.500 0.000 91.000 0.000 91.500 0.000 92.000 0.000 92.500 0.000 93.000 0.000 93.500 0.000 94.000 0.000 94.500 0.000 95.000 0.000 95.500 0.000 96.000 0.000 96.500 0.000 97.000 0.000 97.500 0.000 98.000 0.000 98.500 0.000 99.000 0.000 99.500 0.000 100.000 0.000 100.500 0.000 101.000 0.000 101.500 0.000 102.000 0.000 102.500 0.000 103.000 0.000 103.500 0.000 104.000 0.000 104.500 0.000 105.000 0.000 105.500 0.000 106.000 0.000 106.500 0.000 107.000 0.000 107.500 0.000 108.000 0.000 108.500 0.000 109.000 0.000 109.500 0.000 110.000 0.000 110.500 0.000 111.000 0.000 111.500 0.000 112.000 0.000 112.500 0.000 113.000 0.000 113.500 0.000 114.000 0.000 114.500 0.000 115.000 0.000 115.500 0.000 116.000 0.000 116.500 0.000 117.000 0.000 117.500 0.000 118.000 0.000 118.500 0.000 119.000 0.000 119.500 0.000 120.000 0.000 120.500 0.000 121.000 0.000 121.500 0.000 122.000 0.000 122.500 0.000 123.000 0.000 123.500 0.000 124.000 0.000 124.500 0.000 125.000 0.000 125.500 0.000 126.000 0.000 126.500 0.000 127.000 0.000 127.500 0.000 $ 

the indicated spectrum has spikes @ 4.5hz , 6.0hz, expect based on composition of synthetic input data. note question not appear mechanics of spectral computation, , not expert in that. purpose generate set of output data allows validate results. i'm not suggesting spectral computation useful particular purpose, or correct according mathematics. purpose here root out underlying cuda errors in code.

as additional comment, code set piecewise fft on arbitrary length input data set size (my interpretation, based on usage of batch). how crafted result. think it's reasonable thing do, whether makes sense particular use-case, don't know.


Comments

Popular posts from this blog

python - mat is not a numerical tuple : openCV error -

c# - MSAA finds controls UI Automation doesn't -

wordpress - .htaccess: RewriteRule: bad flag delimiters -