Getting Started
Prerequisites
- CUDA 12.0+
- CMake 3.20+
- Python 3.10+
- GPU with 80GB+ VRAM (A100/H100 recommended)
Installation
bash
# Clone repository
git clone https://github.com/rand0mdevel0per/sxlm.git
cd sxlm
# Install Python dependencies
pip install -r requirements.txt
# Build CUDA components
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -j8Running the API Server
bash
# Start server with uvicorn
python src/python/run_server.py
# Or use uvicorn directly
uvicorn src.python.api.server:app --host 0.0.0.0 --port 8000API will be available at http://localhost:8000
Quick Test
bash
# Run integration tests
./build/Release/integration_test
# Test API endpoint
curl -X POST http://localhost:8000/inference \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello Quila"}'