Skip to content

ISpeechToTextClient does not allow to specify the audio FileName #7543

@marcominerva

Description

@marcominerva

Description

If we create an ISpeechToTextClient from an OpenAIClient, an object of type OpenAISpeechToTextClient.cs is istantiated.

As we can see in the following lines:

string filename = audioSpeechStream is FileStream fileStream ?
Path.GetFileName(fileStream.Name) : // Use the file name if we can get one from the stream.
Filename; // Otherwise, use a default name; this is only used to create a header name in the multipart request.

And then:

var transcription = (await _audioClient.TranscribeAudioAsync(audioSpeechStream, filename, ToOpenAITranscriptionOptions(options), cancellationToken).ConfigureAwait(false)).Value;

The client requires a FileName to determine the audio format. Currently, if the stream isn't a FileStream, a default name is used, with the .mp3 extension:

So, if I have a wave audio stream that is not a FileStream, I get the following exception:

System.ClientModel.ClientResultException: 'HTTP 400 (invalid_request_error: invalid_value)
Parameter: file

Audio file might be corrupted or unsupported'

Reproduction Steps

#!/usr/bin/env dotnet

#:sdk Microsoft.NET.Sdk

#:property OutputType=Exe
#:property TargetFramework=net10.0
#:property ImplicitUsings=enable
#:property NoWarn=$(NoWarn);MEAI001
#:property PublishAot=false

#:package Azure.AI.OpenAI@2.9.0-beta.1
#:package Microsoft.Extensions.AI.OpenAI@10.6.0
#:package Microsoft.Extensions.Logging@10.0.8
#:package Microsoft.Extensions.Logging.Console@10.0.8
#:package NAudio@2.3.0

using System.ClientModel;
using Azure.AI.OpenAI;
using Microsoft.Extensions.AI;
using NAudio.Wave;

Console.WriteLine("Press any key to start sample...");
Console.ReadKey();
Console.WriteLine("Recording audio... Press any key to stop recording.");

var waveFormat = new WaveFormat(44100, 1);
using var memoryStream = new MemoryStream();
using var waveIn = new WaveInEvent
{
    WaveFormat = waveFormat,
};

using (var waveStream = new WaveFileWriter(memoryStream, waveFormat))
{
    waveIn.DataAvailable += (_, e) =>
    {
        waveStream.Write(e.Buffer, 0, e.BytesRecorded);
    };

    waveIn.StartRecording();

    _ = Console.ReadKey();

    waveIn.StopRecording();
}

var endpoint = "";
var apiKey = "";
var model = "";

var azureClient = new AzureOpenAIClient(new(endpoint), new ApiKeyCredential(apiKey));
var audioClient = azureClient.GetAudioClient(model).AsISpeechToTextClient();

var audioBytes = memoryStream.ToArray();
using var transcriptionStream = new MemoryStream(audioBytes);
var transcription = await audioClient.GetTextAsync(transcriptionStream);

Console.WriteLine("Transcription:");
Console.WriteLine(transcription.ToString());

Expected behavior

It must be possibile to specify the audio file format.

Actual behavior

Exception:

System.ClientModel.ClientResultException: 'HTTP 400 (invalid_request_error: invalid_value)
Parameter: file

Audio file might be corrupted or unsupported'

Configuration

.NET 10

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-aiMicrosoft.Extensions.AI librariesbugThis issue describes a behavior which is not expected - a bug.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions