I am very interested in comparing different LLMs against each other. Would love to see a feature like bench marking with own tests or multiple Model output.