Wednesday, January 1, 2014

How to: Convert a file with raw PCM samples to MP4

Note: I have been working on some audio processing related projects at work and thought I could share some code which others find useful. This one is the second in the series. First in the series is here. Second is here.

We use in-built android classes like MediaMuxer, MediaFormat and MediaCodec for conversion. These classes are available only after Jelly-Bean release (API level 17). For older versions you will need to rely on native libraries and JNI to get the job done.

As with recording and playback, do the conversion in a background thread or an AsyncTask. Below is the code snippet:

public static final String AUDIO_RECORDING_FILE_NAME = "recording.raw"; // Input PCM file
public static final String COMPRESSED_AUDIO_FILE_NAME = "compressed.mp4"; // Output MP4 file
public static final String COMPRESSED_AUDIO_FILE_MIME_TYPE = "audio/mp4a-latm";
public static final int COMPRESSED_AUDIO_FILE_BIT_RATE = 128000; // 128kbps
public static final int SAMPLING_RATE = 44100;
public static final int CODEC_TIMEOUT_IN_MS = 5000;
public static final int BUFFER_SIZE = 88200;

@Override
public void run() {
    android.os.Process.setThreadPriority(android.os.Process.THREAD_PRIORITY_BACKGROUND);

    try {
        String filePath = Environment.getExternalStorageDirectory().getPath() + "/" + AUDIO_RECORDING_FILE_NAME;
        File inputFile = new File(filePath);
        FileInputStream fis = new FileInputStream(inputFile);

        File outputFile = new File(Environment.getExternalStorageDirectory().getAbsolutePath() + "/" + COMPRESSED_AUDIO_FILE_NAME);
        if (outputFile.exists()) outputFile.delete();

         MediaMuxer mux = new MediaMuxer(outputFile.getAbsolutePath(), MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4);

        MediaFormat outputFormat = MediaFormat.createAudioFormat(COMPRESSED_AUDIO_FILE_MIME_TYPE,
                SAMPLING_RATE, 1);
        outputFormat.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
        outputFormat.setInteger(MediaFormat.KEY_BIT_RATE, COMPRESSED_AUDIO_FILE_BIT_RATE);

        MediaCodec codec = MediaCodec.createEncoderByType(COMPRESSED_AUDIO_FILE_MIME_TYPE);
        codec.configure(outputFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
        codec.start();

        ByteBuffer[] codecInputBuffers = codec.getInputBuffers(); // Note: Array of buffers
        ByteBuffer[] codecOutputBuffers = codec.getOutputBuffers();

        MediaCodec.BufferInfo outBuffInfo = new MediaCodec.BufferInfo();

        byte[] tempBuffer = new byte[BUFFER_SIZE];
        boolean hasMoreData = true;
        double presentationTimeUs = 0;
        int audioTrackIdx = 0;
        int totalBytesRead = 0;
        int percentComplete;

        do {

            int inputBufIndex = 0;
            while (inputBufIndex != -1 && hasMoreData) {
                inputBufIndex = codec.dequeueInputBuffer(CODEC_TIMEOUT_IN_MS);

                if (inputBufIndex >= 0) {
                    ByteBuffer dstBuf = codecInputBuffers[inputBufIndex];
                    dstBuf.clear();

                    int bytesRead = fis.read(tempBuffer, 0, dstBuf.limit());
                    if (bytesRead == -1) { // -1 implies EOS
                        hasMoreData = false;
                        codec.queueInputBuffer(inputBufIndex, 0, 0, (long) presentationTimeUs, MediaCodec.BUFFER_FLAG_END_OF_STREAM);
                    } else {
                        totalBytesRead += bytesRead;
                        dstBuf.put(tempBuffer, 0, bytesRead);
                        codec.queueInputBuffer(inputBufIndex, 0, bytesRead, (long) presentationTimeUs, 0);
                        presentationTimeUs = 1000000l * (totalBytesRead / 2) / SAMPLING_RATE;
                    }
                }
            }

            // Drain audio
            int outputBufIndex = 0;
            while (outputBufIndex != MediaCodec.INFO_TRY_AGAIN_LATER) {

                outputBufIndex = codec.dequeueOutputBuffer(outBuffInfo, CODEC_TIMEOUT_IN_MS);
                if (outputBufIndex >= 0) {
                    ByteBuffer encodedData = codecOutputBuffers[outputBufIndex];
                    encodedData.position(outBuffInfo.offset);
                    encodedData.limit(outBuffInfo.offset + outBuffInfo.size);

                    if ((outBuffInfo.flags & MediaCodec.BUFFER_FLAG_CODEC_CONFIG) != 0 && outBuffInfo.size != 0) {
                        codec.releaseOutputBuffer(outputBufIndex, false);
                    } else {
                        mux.writeSampleData(audioTrackIdx, codecOutputBuffers[outputBufIndex], outBuffInfo);
                        codec.releaseOutputBuffer(outputBufIndex, false);
                    }
                } else if (outputBufIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
                    outputFormat = codec.getOutputFormat();
                    Log.v(LOGTAG, "Output format changed - " + outputFormat);
                    audioTrackIdx = mux.addTrack(outputFormat);
                    mux.start();
                } else if (outputBufIndex == MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED) {
                    Log.e(LOGTAG, "Output buffers changed during encode!");
                } else if (outputBufIndex == MediaCodec.INFO_TRY_AGAIN_LATER) {
                    // NO OP
                } else {
                    Log.e(LOGTAG, "Unknown return code from dequeueOutputBuffer - " + outputBufIndex);
                }
            }
            percentComplete = (int) Math.round(((float) totalBytesRead / (float) inputFile.length()) * 100.0);
            Log.v(LOGTAG, "Conversion % - " percentComplete);
        } while (outBuffInfo.flags != MediaCodec.BUFFER_FLAG_END_OF_STREAM && !mStop);

        fis.close();
        mux.stop();
        mux.release();
        Log.v(LOGTAG, "Compression done ...");
    } catch (FileNotFoundException e) {
        Log.e(LOGTAG, "File not found!", e);
    } catch (IOException e) {
        Log.e(LOGTAG, "IO exception!", e);
    }

    mStop = false;
    // Notify UI thread...
}

Written with StackEdit.

15 comments:

  1. Reached here after browsing tens of posts about AAC encoding. All of those used mediaExtractor. This one is great and helpful. One question, how can we decode raw AAC to raw PCM using media codec? (Its easy with mediaExtractor but . .)

    ReplyDelete
  2. Thanks a lot! I have only one problem with the decoding of my PCM data - result track has twice bigger duration, and a lot of syncronious gaps, like 10-20 times every seconds. I have already set the outputFormat.setInteger(MediaFormat.KEY_CHANNEL_COUNT, 2), without it the track was doubled slowed. Don't you know what it might be? My PCM has 44100 frequency, 2 channels, 16 bit samples.

    ReplyDelete
  3. Nice
    how do you keep only a few seconds (predefined value or value of other buffer such as video) of the audio and mux it with video?

    ReplyDelete
  4. Thanks, this was a great help !

    ReplyDelete
  5. @Abhilash Ramakrishna
    Why are these lines of code? This lines of code are imho not needed.

    ByteBuffer dstBuf = codecInputBuffers[inputBufIndex];
    dstBuf.clear();
    dstBuf.put(tempBuffer, 0, bytesRead);


    ByteBuffer encodedData = codecOutputBuffers[outputBufIndex];
    encodedData.position(outBuffInfo.offset);
    encodedData.limit(outBuffInfo.offset + outBuffInfo.size);

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. @Abhilash Ramakrishna

    The program runs without these 3 lines of code ???

    ByteBuffer encodedData = codecOutputBuffers[outputBufIndex];
    encodedData.position(outBuffInfo.offset);
    encodedData.limit(outBuffInfo.offset + outBuffInfo.size);

    ReplyDelete
  9. @Abhilash Ramakrishna

    Also what is the point of this line, without it program works:

    codec.releaseOutputBuffer(outputBufIndex, false);

    ReplyDelete
  10. Hi, can you provide code for converting from MPEG to raw? I am trying to do an FFT on the converted data, so would need to convert it first to PCM16 (audio/raw) format.

    thanks
    Srini

    ReplyDelete
  11. when calculate presentationTimeUs: presentationTimeUs= 1000000l * (totalBytesRead / 2) / SAMPLING_RATE; why 1000000l ?

    ReplyDelete
  12. Hi, I'm running this with mp3 files, and when the file arrives it's just static. Can you point me in the right direction?

    Thanks!

    ReplyDelete
  13. hi .,
    its producing Audio file but song sound is not coming.

    ReplyDelete