LMMs-Lab is an open research organization founded by students and faculty from NTU, Singapore, in close collaboration with research labs and companies worldwide. Our released models include abilities from vision, audio, and text. This is a model documentation page for the released model
Model List
Aero-1-Audio : Aero is a compact audio model capable of handling a range of audio tasks, including speech recognition, audio understanding, and audio instructions following.
This guide helps you quickly start using models from lmms-lab. We provide examples of Hugging Face Transformers and vLLM for deployment.
You can find all the models at Hugging Face Hub
Aero
Huggingface
To get a quick start with Aero, we advise you to try with the inference with transformers first. We advise you to use Python 3.10 or higher, and PyTorch 2.3 or higher.
The following is a quick start using transformers
from transformers import AutoProcessor, AutoModelForCausalLMimport torchimport librosadef load_audio(): return librosa.load(librosa.ex("libri1"), sr=16000)[0]processor = AutoProcessor.from_pretrained("lmms-lab/Aero-1-Audio-1.5B", trust_remote_code=True)# We encourage to use flash attention 2 for better performance# Please install it with `pip install --no-build-isolation flash-attn`# If you do not want flash attn, please use sdpa or eager`model = AutoModelForCausalLM.from_pretrained("lmms-lab/Aero-1-Audio-1.5B", device_map="cuda", torch_dtype="auto", attn_implementation="flash_attention_2", trust_remote_code=True)model.eval()messages = [ { "role": "user", "content": [ { "type": "audio_url", "audio": "placeholder", }, { "type": "text", "text": "Please transcribe the audio", } ] }]audios = [load_audio()]prompt = processor.apply_chat_template(messages, add_generation_prompt=True)inputs = processor(text=prompt, audios=audios, sampling_rate=16000, return_tensors="pt")inputs = {k: v.to("cuda") for k, v in inputs.items()}outputs = model.generate(**inputs, eos_token_id=151645, max_new_tokens=4096)cont = outputs[:, inputs["input_ids"].shape[-1] :]print(processor.batch_decode(cont, skip_special_tokens=True)[0])
It is also supported batch inference with transformers, here is a simple example:
from transformers import AutoProcessor, AutoModelForCausalLMimport torchimport librosadef load_audio(): return librosa.load(librosa.ex("libri1"), sr=16000)[0]def load_audio_2(): return librosa.load(librosa.ex("libri2"), sr=16000)[0]processor = AutoProcessor.from_pretrained("lmms-lab/Aero-1-Audio-1.5B", trust_remote_code=True)# We encourage to use flash attention 2 for better performance# Please install it with `pip install --no-build-isolation flash-attn`# If you do not want flash attn, please use sdpa or eager`model = AutoModelForCausalLM.from_pretrained("lmms-lab/Aero-1-Audio-1.5B", device_map="cuda", torch_dtype="auto", attn_implementation="flash_attention_2", trust_remote_code=True)model.eval()messages = [ { "role": "user", "content": [ { "type": "audio_url", "audio": "placeholder", }, { "type": "text", "text": "Please transcribe the audio", } ] }]messages = [messages, messages]audios = [load_audio(), load_audio_2()]processor.tokenizer.padding_side="left"prompt = processor.apply_chat_template(messages, add_generation_prompt=True)inputs = processor(text=prompt, audios=audios, sampling_rate=16000, return_tensors="pt", padding=True)inputs = {k: v.to("cuda") for k, v in inputs.items()}outputs = model.generate(**inputs, eos_token_id=151645, pad_token_id=151643, max_new_tokens=4096)cont = outputs[:, inputs["input_ids"].shape[-1] :]print(processor.batch_decode(cont, skip_special_tokens=True))
vLLM
To deploy using vllm, you can install vllm with this script