auditok.util¶
Class summary¶
DataSource |
Base class for objects passed to auditok.core.StreamTokenizer.tokenize() . |
StringDataSource (data) |
A class that represent a DataSource as a string buffer. |
ADSFactory |
Factory class that makes it easy to create an ADSFactory.AudioDataSource object that implements DataSource and can therefore be passed to auditok.core.StreamTokenizer.tokenize() . |
ADSFactory.AudioDataSource (audio_source, …) |
Base class for AudioDataSource objects. |
ADSFactory.ADSDecorator (ads) |
Base decorator class for AudioDataSource objects. |
ADSFactory.OverlapADS (ads, hop_size) |
A class for AudioDataSource objects that can read and return overlapping audio frames |
ADSFactory.LimiterADS (ads, max_time) |
A class for AudioDataSource objects that can read a fixed amount of data. |
ADSFactory.RecorderADS (ads) |
A class for AudioDataSource objects that can record all audio data they read, with a rewind facility. |
DataValidator |
Base class for a validator object used by core.StreamTokenizer to check if read data is valid. |
AudioEnergyValidator (sample_width[, …]) |
The most basic auditok audio frame validator. |
-
class
auditok.util.
DataSource
[source]¶ Base class for objects passed to
auditok.core.StreamTokenizer.tokenize()
. Subclasses should implement aDataSource.read()
method.
-
class
auditok.util.
DataValidator
[source]¶ Base class for a validator object used by
core.StreamTokenizer
to check if read data is valid. Subclasses should implementis_valid()
method.
-
class
auditok.util.
StringDataSource
(data)[source]¶ A class that represent a
DataSource
as a string buffer. Each call toDataSource.read()
returns on character and moves one step forward. If the end of the buffer is reached,read()
returns None.Parameters: - data :
a basestring object.
-
class
auditok.util.
ADSFactory
[source]¶ Factory class that makes it easy to create an
ADSFactory.AudioDataSource
object that implementsDataSource
and can therefore be passed toauditok.core.StreamTokenizer.tokenize()
.Whether you read audio data from a file, the microphone or a memory buffer, this factory instantiates and returns the right
ADSFactory.AudioDataSource
object.There are many other features you want your
ADSFactory.AudioDataSource
object to have, such as: memorize all read audio data so that you can rewind and reuse it (especially useful when reading data from the microphone), read a fixed amount of data (also useful when reading from the microphone), read overlapping audio frames (often needed when dosing a spectral analysis of data).ADSFactory.ads()
automatically creates and return object with the desired behavior according to the supplied keyword arguments.-
class
AudioDataSource
(audio_source, block_size)[source]¶ Base class for AudioDataSource objects. It inherits from DataSource and encapsulates an AudioSource object.
-
class
LimiterADS
(ads, max_time)[source]¶ A class for AudioDataSource objects that can read a fixed amount of data. This can be useful when reading data from the microphone or from large audio files.
-
class
OverlapADS
(ads, hop_size)[source]¶ A class for AudioDataSource objects that can read and return overlapping audio frames
-
class
RecorderADS
(ads)[source]¶ A class for AudioDataSource objects that can record all audio data they read, with a rewind facility.
-
static
ads
(**kwargs)[source]¶ Create an return an
ADSFactory.AudioDataSource
. The type and behavior of the object is the result of the supplied parameters.Parameters: - No parameters :
- read audio data from the available built-in microphone with the default parameters.
The returned
ADSFactory.AudioDataSource
encapsulate anio.PyAudioSource
object and hence it accepts the next four parameters are passed to use instead of their default values. - sampling_rate, sr : (int)
- number of samples per second. Default = 16000.
- sample_width, sw : (int)
- number of bytes per sample (must be in (1, 2, 4)). Default = 2
- channels, ch : (int)
- number of audio channels. Default = 1 (only this value is currently accepted)
- frames_per_buffer, fpb : (int)
- number of samples of PyAudio buffer. Default = 1024.
- audio_source, asrc : an AudioSource object
- read data from this audio source
- filename, fn : (string)
- build an io.AudioSource object using this file (currently only wave format is supported)
- data_buffer, db : (string)
- build an io.BufferAudioSource using data in data_buffer. If this keyword is used, sampling_rate, sample_width and channels are passed to io.BufferAudioSource constructor and used instead of default values.
- max_time, mt : (float)
- maximum time (in seconds) to read. Default behavior: read until there is no more data available.
- record, rec : (bool)
- save all read data in cache. Provide a navigable object which boasts a rewind method. Default = False.
- block_dur, bd : (float)
- processing block duration in seconds. This represents the quantity of audio data to return
each time the
read()
method is invoked. If block_dur is 0.025 (i.e. 25 ms) and the sampling rate is 8000 and the sample width is 2 bytes,read()
returns a buffer of 0.025 * 8000 * 2 = 400 bytes at most. This parameter will be looked for (and used if available) before block_size. If neither parameter is given, block_dur will be set to 0.01 second (i.e. 10 ms) - hop_dur, hd : (float)
- quantity of data to skip from current processing window. if hop_dur is supplied then there will be an overlap of block_dur - hop_dur between two adjacent blocks. This parameter will be looked for (and used if available) before hop_size. If neither parameter is given, hop_dur will be set to block_dur which means that there will be no overlap between two consecutively read blocks.
- block_size, bs : (int)
- number of samples to read each time the read method is called. Default: a block size that represents a window of 10ms, so for a sampling rate of 16000, the default block_size is 160 samples, for a rate of 44100, block_size = 441 samples, etc.
- hop_size, hs : (int)
- determines the number of overlapping samples between two adjacent read windows. For a hop_size of value N, the overlap is block_size - N. Default : hop_size = block_size, means that there is no overlap.
Returns: An AudioDataSource object that has the desired features.
Exampels: - Create an AudioDataSource that reads data from the microphone (requires Pyaudio) with default audio parameters:
from auditok import ADSFactory ads = ADSFactory.ads() ads.get_sampling_rate() 16000 ads.get_sample_width() 2 ads.get_channels() 1
- Create an AudioDataSource that reads data from the microphone with a sampling rate of 48KHz:
from auditok import ADSFactory ads = ADSFactory.ads(sr=48000) ads.get_sampling_rate() 48000
- Create an AudioDataSource that reads data from a wave file:
import auditok from auditok import ADSFactory ads = ADSFactory.ads(fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence) ads.get_sampling_rate() 44100 ads.get_sample_width() 2 ads.get_channels() 1
- Define size of read blocks as 20 ms
import auditok from auditok import ADSFactory ''' we know samling rate for previous file is 44100 samples/second so 10 ms are equivalent to 441 samples and 20 ms to 882 ''' block_size = 882 ads = ADSFactory.ads(bs = 882, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence) ads.open() # read one block data = ads.read() ads.close() len(data) 1764 assert len(data) == ads.get_sample_width() * block_size
- Define block size as a duration (use block_dur or bd):
import auditok from auditok import ADSFactory dur = 0.25 # second ads = ADSFactory.ads(bd = dur, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence) ''' we know samling rate for previous file is 44100 samples/second for a block duration of 250 ms, block size should be 0.25 * 44100 = 11025 ''' ads.get_block_size() 11025 assert ads.get_block_size() == int(0.25 * 44100) ads.open() # read one block data = ads.read() ads.close() len(data) 22050 assert len(data) == ads.get_sample_width() * ads.get_block_size()
- Read overlapping blocks (one of hope_size, hs, hop_dur or hd > 0):
For better readability we’d better use
auditok.io.BufferAudioSource
with a string buffer:import auditok from auditok import ADSFactory ''' we supply a data beffer instead of a file (keyword 'bata_buffer' or 'db') sr : sampling rate = 16 samples/sec sw : sample width = 1 byte ch : channels = 1 ''' buffer = "abcdefghijklmnop" # 16 bytes = 1 second of data bd = 0.250 # block duration = 250 ms = 4 bytes hd = 0.125 # hop duration = 125 ms = 2 bytes ads = ADSFactory.ads(db = "abcdefghijklmnop", bd = bd, hd = hd, sr = 16, sw = 1, ch = 1) ads.open() ads.read() 'abcd' ads.read() 'cdef' ads.read() 'efgh' ads.read() 'ghij' data = ads.read() assert data == 'ijkl'
- Limit amount of read data (use max_time or mt):
''' We know audio file is larger than 2.25 seconds We want to read up to 2.25 seconds of audio data ''' ads = ADSFactory.ads(mt = 2.25, fn=auditok.dataset.was_der_mensch_saet_mono_44100_lead_trail_silence) ads.open() data = [] while True: d = ads.read() if d is None: break data.append(d) ads.close() data = b''.join(data) assert len(data) == int(ads.get_sampling_rate() * 2.25 * ads.get_sample_width() * ads.get_channels())
-
class
-
class
auditok.util.
AudioEnergyValidator
(sample_width, energy_threshold=45)[source]¶ The most basic auditok audio frame validator. This validator computes the log energy of an input audio frame and return True if the result is >= a given threshold, False otherwise.
Parameters: - sample_width : (int)
- Number of bytes of one audio sample. This is used to convert data from basestring or Bytes to an array of floats.
- energy_threshold : (float)
- A threshold used to check whether an input data buffer is valid.
-
is_valid
(data)[source]¶ Check if data is valid. Audio data will be converted into an array (of signed values) of which the log energy is computed. Log energy is computed as follows:
arr = AudioEnergyValidator._convert(signal, sample_width) energy = float(numpy.dot(arr, arr)) / len(arr) log_energy = 10. * numpy.log10(energy)
Parameters: - data : either a string or a Bytes buffer
- data is converted into a numerical array using the sample_width given in the constructor.
Retruns: True if log_energy >= energy_threshold, False otherwise.