yeah i'm excited about how this will augment llamaindex. there's obviously still compute/cost considerations in putting 32k tokens into a single LLM call, and I'd love to see how the expanded cost window expands current use cases + still introduces necessary cost/latency sondierations
Yeah, itβs not clear to me the scaling limitations/factors of gpt-3+. Like, are 32k token windows for most users a plausible near future goal? Or are the computational resources needed for that untenable at a massive scale π€·π½ββοΈ