Description
If we create an ISpeechToTextClient from an OpenAIClient, an object of type OpenAISpeechToTextClient.cs is istantiated.
As we can see in the following lines:
|
string filename = audioSpeechStream is FileStream fileStream ? |
|
Path.GetFileName(fileStream.Name) : // Use the file name if we can get one from the stream. |
|
Filename; // Otherwise, use a default name; this is only used to create a header name in the multipart request. |
And then:
|
var transcription = (await _audioClient.TranscribeAudioAsync(audioSpeechStream, filename, ToOpenAITranscriptionOptions(options), cancellationToken).ConfigureAwait(false)).Value; |
The client requires a FileName to determine the audio format. Currently, if the stream isn't a FileStream, a default name is used, with the .mp3 extension:
|
private const string Filename = "audio.mp3"; |
So, if I have a wave audio stream that is not a FileStream, I get the following exception:
System.ClientModel.ClientResultException: 'HTTP 400 (invalid_request_error: invalid_value)
Parameter: file
Audio file might be corrupted or unsupported'
Reproduction Steps
#!/usr/bin/env dotnet
#:sdk Microsoft.NET.Sdk
#:property OutputType=Exe
#:property TargetFramework=net10.0
#:property ImplicitUsings=enable
#:property NoWarn=$(NoWarn);MEAI001
#:property PublishAot=false
#:package Azure.AI.OpenAI@2.9.0-beta.1
#:package Microsoft.Extensions.AI.OpenAI@10.6.0
#:package Microsoft.Extensions.Logging@10.0.8
#:package Microsoft.Extensions.Logging.Console@10.0.8
#:package NAudio@2.3.0
using System.ClientModel;
using Azure.AI.OpenAI;
using Microsoft.Extensions.AI;
using NAudio.Wave;
Console.WriteLine("Press any key to start sample...");
Console.ReadKey();
Console.WriteLine("Recording audio... Press any key to stop recording.");
var waveFormat = new WaveFormat(44100, 1);
using var memoryStream = new MemoryStream();
using var waveIn = new WaveInEvent
{
WaveFormat = waveFormat,
};
using (var waveStream = new WaveFileWriter(memoryStream, waveFormat))
{
waveIn.DataAvailable += (_, e) =>
{
waveStream.Write(e.Buffer, 0, e.BytesRecorded);
};
waveIn.StartRecording();
_ = Console.ReadKey();
waveIn.StopRecording();
}
var endpoint = "";
var apiKey = "";
var model = "";
var azureClient = new AzureOpenAIClient(new(endpoint), new ApiKeyCredential(apiKey));
var audioClient = azureClient.GetAudioClient(model).AsISpeechToTextClient();
var audioBytes = memoryStream.ToArray();
using var transcriptionStream = new MemoryStream(audioBytes);
var transcription = await audioClient.GetTextAsync(transcriptionStream);
Console.WriteLine("Transcription:");
Console.WriteLine(transcription.ToString());
Expected behavior
It must be possibile to specify the audio file format.
Actual behavior
Exception:
System.ClientModel.ClientResultException: 'HTTP 400 (invalid_request_error: invalid_value)
Parameter: file
Audio file might be corrupted or unsupported'
Configuration
.NET 10
Description
If we create an
ISpeechToTextClientfrom anOpenAIClient, an object of type OpenAISpeechToTextClient.cs is istantiated.As we can see in the following lines:
extensions/src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAISpeechToTextClient.cs
Lines 67 to 69 in 336462c
And then:
extensions/src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAISpeechToTextClient.cs
Line 87 in 336462c
The client requires a FileName to determine the audio format. Currently, if the stream isn't a
FileStream, a default name is used, with the.mp3extension:extensions/src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAISpeechToTextClient.cs
Line 29 in 336462c
So, if I have a wave audio stream that is not a FileStream, I get the following exception:
Reproduction Steps
Expected behavior
It must be possibile to specify the audio file format.
Actual behavior
Exception:
Configuration
.NET 10