Util¶
AudioEnergyValidator (energy_threshold, …) |
A validator based on audio signal energy. |
AudioReader (input[, block_dur, hop_dur, …]) |
Class to read fixed-size chunks of audio data from a source. |
Recorder (input[, block_dur, hop_dur, max_read]) |
Class to read fixed-size chunks of audio data from a source and keeps data in a cache. |
make_duration_formatter (fmt) |
Make and return a function used to format durations in seconds. |
make_channel_selector (sample_width, channels) |
Create and return a callable used for audio channel selection. |
-
auditok.util.
make_duration_formatter
(fmt)[source]¶ Make and return a function used to format durations in seconds. Accepted format directives are:
%S
: absolute number of seconds with 3 decimals. This direction should be used alone.%i
: milliseconds%s
: seconds%m
: minutes%h
: hours
These last 4 directives should all be specified. They can be placed anywhere in the input string.
Parameters: fmt (str) – duration format. Returns: formatter – a function that takes a duration in seconds (float) and returns a string that corresponds to that duration. Return type: callable Raises: TimeFormatError
– if the format contains an unknown directive.Examples
Using
%S
:formatter = make_duration_formatter("%S") formatter(123.589) '123.589' formatter(123) '123.000'
Using the other directives:
formatter = make_duration_formatter("%h:%m:%s.%i") formatter(3600+120+3.25) '01:02:03.250' formatter = make_duration_formatter("%h hrs, %m min, %s sec and %i ms") formatter(3600+120+3.25) '01 hrs, 02 min, 03 sec and 250 ms' # omitting one of the 4 directives might result in a wrong duration formatter = make_duration_formatter("%m min, %s sec and %i ms") formatter(3600+120+3.25) '02 min, 03 sec and 250 ms'
-
auditok.util.
make_channel_selector
(sample_width, channels, selected=None)[source]¶ Create and return a callable used for audio channel selection. The returned selector can be used as selector(audio_data) and returns data that contains selected channel only.
Importantly, if selected is None or equals “any”, selector(audio_data) will separate and return a list of available channels: [data_channe_1, data_channe_2, …].
Note also that returned selector expects bytes format for input data but does notnecessarily return a bytes object. In fact, in order to extract the desired channel (or compute the average channel if selected = “avg”), it first converts input data into a array.array (or numpy.ndarray) object. After channel of interst is selected/computed, it is returned as such, without any reconversion to bytes. This behavior is wanted for efficiency purposes because returned objects can be directly used as buffers of bytes. In any case, returned objects can be converted back to bytes using bytes(obj).
Exception to this is the special case where channels = 1 in which input data is returned without any processing.
Parameters: - sample_width (int) – number of bytes used to encode one audio sample, should be 1, 2 or 4.
- channels (int) – number of channels of raw audio data that the returned selector should expect.
- selected (int or str, default: None) – audio channel to select and return when calling selector(raw_data). It should be an int >= -channels and < channels. If one of “mix”, “avg” or “average” is passed then selector will return the average channel of audio data. If None or “any”, return a list of all available channels at each call.
Returns: selector – a callable that can be used as selector(audio_data) and returns data that contains channel of interst.
Return type: callable
Raises: ValueError
– if sample_width is not one of 1, 2 or 4, or if selected has an unexpected value.
-
class
auditok.util.
DataSource
[source]¶ Base class for objects passed to
StreamTokenizer.tokenize()
. Subclasses should implement aDataSource.read()
method.
-
class
auditok.util.
DataValidator
[source]¶ Base class for a validator object used by
core.StreamTokenizer
to check if read data is valid. Subclasses should implementis_valid()
method.
-
class
auditok.util.
StringDataSource
(data)[source]¶ Class that represent a
DataSource
as a string buffer. Each call toDataSource.read()
returns on character and moves one step forward. If the end of the buffer is reached,read()
returns None.Parameters: data (str) – a string object used as data.
-
class
auditok.util.
ADSFactory
[source]¶ Deprecated since version 2.0.0: ADSFactory will be removed in auditok 2.0.1, use instances of
AudioReader
instead.Factory class that makes it easy to create an
AudioDataSource
object that implementsDataSource
and can therefore be passed toauditok.core.StreamTokenizer.tokenize()
.Whether you read audio data from a file, the microphone or a memory buffer, this factory instantiates and returns the right
AudioDataSource
object.There are many other features you want a
AudioDataSource
object to have, such as: memorize all read audio data so that you can rewind and reuse it (especially useful when reading data from the microphone), read a fixed amount of data (also useful when reading from the microphone), read overlapping audio frames (often needed when dosing a spectral analysis of data).ADSFactory.ads()
automatically creates and return object with the desired behavior according to the supplied keyword arguments.-
static
ads
(**kwargs)[source]¶ Create an return an
AudioDataSource
. The type and behavior of the object is the result of the supplied parameters. Called without any parameters, the class will read audio data from the available built-in microphone with the default parameters.Parameters: - sr (sampling_rate,) – number of audio samples per second of input audio stream.
- sw (sample_width,) – number of bytes per sample, must be one of 1, 2 or 4
- ch (channels,) – number of audio channels, only a value of 1 is currently accepted.
- fpb (frames_per_buffer,) – number of samples of PyAudio buffer.
- asrc (audio_source,) – AudioSource to read data from
- fn (filename,) – create an AudioSource object using this file
- db (data_buffer,) – build an io.BufferAudioSource using data in data_buffer. If this keyword is used, sampling_rate, sample_width and channels are passed to io.BufferAudioSource constructor and used instead of default values.
- mt (max_time,) – maximum time (in seconds) to read. Default behavior: read until there is no more data available.
- rec (record,) – save all read data in cache. Provide a navigable object which has a rewind method.
- bd (block_dur,) – processing block duration in seconds. This represents the quantity
of audio data to return each time the
read()
method is invoked. If block_dur is 0.025 (i.e. 25 ms) and the sampling rate is 8000 and the sample width is 2 bytes,read()
returns a buffer of 0.025 * 8000 * 2 = 400 bytes at most. This parameter will be looked for (and used if available) before block_size. If neither parameter is given, block_dur will be set to 0.01 second (i.e. 10 ms) - hd (hop_dur,) – quantity of data to skip from current processing window. if hop_dur is supplied then there will be an overlap of block_dur - hop_dur between two adjacent blocks. This parameter will be looked for (and used if available) before hop_size. If neither parameter is given, hop_dur will be set to block_dur which means that there will be no overlap between two consecutively read blocks.
- bs (block_size,) – number of samples to read each time the read method is called. Default: a block size that represents a window of 10ms, so for a sampling rate of 16000, the default block_size is 160 samples, for a rate of 44100, block_size = 441 samples, etc.
- hs (hop_size,) – determines the number of overlapping samples between two adjacent read windows. For a hop_size of value N, the overlap is block_size - N. Default : hop_size = block_size, means that there is no overlap.
Returns: audio_data_source – an AudioDataSource object build with input parameters.
Return type: AudioDataSource
-
static
-
auditok.util.
AudioDataSource
¶ alias of
auditok.util.AudioReader
-
class
auditok.util.
AudioReader
(input, block_dur=0.01, hop_dur=None, record=False, max_read=None, **kwargs)[source]¶ Class to read fixed-size chunks of audio data from a source. A source can be a file on disk, standard input (with input = “-“) or microphone. This is normally used by tokenization algorithms that expect source objects with a read function that returns a windows of data of the same size at each call expect when remaining data does not make up a full window.
Objects of this class can be set up to return audio windows with a given overlap and to record the whole stream for later access (useful when reading data from the microphone). They can also have a limit for the maximum amount of data to read.
Parameters: - input (str, bytes, AudioSource, AudioReader, AudioRegion or None) – input audio data. If the type of the passed argument is str, it should
be a path to an existing audio file. “-” is interpreted as standardinput.
If the type is bytes, input is considered as a buffer of raw audio
data. If None, read audio from microphone. Every object that is not an
AudioReader
will be transformed, when possible, into anAudioSource
before processing. If it is an str that refers to a raw audio file, bytes or None, audio parameters should be provided using kwargs (i.e., samplig_rate, sample_width and channels or their alias). - block_dur (float, default: 0.01) – length in seconds of audio windows to return at each read call.
- hop_dur (float, default: None) – length in seconds of data amount to skip from previous window. If defined, it is used to compute the temporal overlap between previous and current window (nameply overlap = block_dur - hop_dur). Default, None, means that consecutive windows do not overlap.
- record (bool, default: False) – whether to record read audio data for later access. If True, audio data can be retrieved by first calling rewind(), then using the data property. Note that once rewind() is called, no new data will be read from source (subsequent read() call will read data from cache) and that there’s no need to call rewind() again to access data property.
- max_read (float, default: None) – maximum amount of audio data to read in seconds. Default is None meaning that data will be read until end of stream is reached or, when reading from microphone a Ctrl-C is sent.
- input is None, of type bytes or a raw audio files some of the (When) –
- kwargs are mandatory. (follwing) –
Other Parameters: - audio_format, fmt (str) – type of audio data (e.g., wav, ogg, flac, raw, etc.). This will only be used if input is a string path to an audio file. If not given, audio type will be guessed from file name extension or from file header.
- sampling_rate, sr (int) – sampling rate of audio data. Required if input is a raw audio file, is a bytes object or None (i.e., read from microphone).
- sample_width, sw (int) – number of bytes used to encode one audio sample, typically 1, 2 or 4. Required for raw data, see sampling_rate.
- channels, ch (int) – number of channels of audio data. Required for raw data, see sampling_rate.
- use_channel, uc ({None, “any”, “mix”, “avg”, “average”} or int) – which channel to use for split if input has multiple audio channels.
Regardless of which channel is used for splitting, returned audio events
contain data from all the channels of input. The following values
are accepted:
- None (alias “any”): accept audio activity from any channel, even if other channels are silent. This is the default behavior.
- “mix” (alias “avg” or “average”): mix down all channels (i.e., compute average channel) and split the resulting channel.
- int (>= 0 , < channels): use one channel, specified by its integer id, for split.
- large_file (bool, default: False) – If True, AND if input is a path to a wav of a raw audio file (and only these two formats) then audio data is lazily loaded to memory (i.e., one analysis window a time). Otherwise the whole file is loaded to memory before split. Set to True if the size of the file is larger than available memory.
- input (str, bytes, AudioSource, AudioReader, AudioRegion or None) – input audio data. If the type of the passed argument is str, it should
be a path to an existing audio file. “-” is interpreted as standardinput.
If the type is bytes, input is considered as a buffer of raw audio
data. If None, read audio from microphone. Every object that is not an
-
class
auditok.util.
Recorder
(input, block_dur=0.01, hop_dur=None, max_read=None, **kwargs)[source]¶ Class to read fixed-size chunks of audio data from a source and keeps data in a cache. Using this class is equivalent to initializing
AudioReader
with record=True. For more information about the other parameters seeAudioReader
.Once the desired amount of data is read, you can call the
rewind()
method then get the recorded data via thedata
attribute. You can also re-read cached data one window a time by callingread()
.
-
class
auditok.util.
AudioEnergyValidator
(energy_threshold, sample_width, channels, use_channel=None)[source]¶ A validator based on audio signal energy. For an input window of N audio samples (see
AudioEnergyValidator.is_valid()
), the energy is computed as:\[energy = 20 \log(\sqrt({1}/{N}\sum_{i}^{N}{a_i}^2)) % # noqa: W605\]where a_i is the i-th audio sample.
Parameters: - energy_threshold (float) – minimum energy that audio window should have to be valid.
- sample_width (int) – size in bytes of one audio sample.
- channels (int) – number of channels of audio data.
- use_channel ({None, "any", "mix", "avg", "average"} or int) –
channel to use for energy computation. The following values are accepted:
- None (alias “any”) : compute energy for each of the channels and return the maximum value.
- ”mix” (alias “avg” or “average”) : compute the average channel then compute its energy.
- int (>= 0 , < channels) : compute the energy of the specified channel and ignore the other ones.
Returns: energy – energy of the audio window.
Return type: float