Find answers from the community

Updated 2 years ago

Does anybody know how to use streamlit

Does anybody know how to use streamlit with the latest "load_index_from_storage" function ? The index returned is not "pickleable" and thus using st.cache_data does not work ...

Other question: any recommandation for a lib or a module that would split a large json file like a vector_store.json file ? If the file is >100Mb, it cannot be uploaded to github (unless we use github large files storage -- whatever that is) ... I'd prefer to split the file, store it in parts on github, then when it's time to read it back, simply "join" the json again before feeding it to a storage_context ... unless github large files is really cool and the way to go ... πŸ˜‰
L
t
18 comments
What's the streamlit function look like that you have currently? There are ways around the cache issue
(Also not sure on the second part, you might have to craft your own script to do that lol)
I would probably put the index on S3 or a google bucket though, rather than github
This is the function
Plain Text
# TODO: cache index (only the index struct from storage_context)
def load_index_folder (idx_folder, service_context):
    print ("Loading indices from [" + idx_folder  + "]");
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir=idx_folder)

    # load index
    index = load_index_from_storage(storage_context, service_context=service_context)
    
    return index

The comment I wrote is after several tests ... I noticed I can st.cache_data the index_struct before it's converted into a class in load_*indices*_from_storage() ... but it would be simpler if I could just cache the returned index ... The error I got is something like this: https://github.com/jerryjliu/llama_index/issues/886
does st.cache_resource decorator work?
no but I must admit I don't remember the error then πŸ™‚ I'll investigate again tomorrow ...
Yea I've definitely cached the index in a streamlit before, I'm pretty sure st.cache_resource worked πŸ€”
yes that's what you did in your git but I think the structure of the whole thing must have changed or something ... not sure ... I'm a nodeJS guy, not a Python guy (and weirdly I'm beginning to like it) and I just discovered today what a "pickle" was πŸ˜‰
I'll test it later today... maybe it's a streamlit version thing?

lol man i love python. JS/node is ok too, but I really only use it for complex frontends πŸ™‚
I've always had very strong biases against python .. I was wrong ... there are some very impressive stuff syntactically ... the only thing I dislike is the indentation ... not that I'm a die-hard fan of curly brackets but it's imho less prone to errors ...
definitely fair, the indenting takes some getting used to lol
Part of the problem is in the definition of the function and its parameters as they are cached too ...

Thus I replaced the above function with this one, called by the main program just after calling the above function:
Plain Text
@st.cache_resource
def cache_index(index):
    return index
The error I have is now
UnhashableParamError: Cannot hash argument 'index' (of type llama_index.indices.vector_store.base.VectorStoreIndex) in 'cache_index'.
If I try with @st.cache_data, the error is the same.
@Logan M I don't know what exactly is unhashable and why ...
You can give the function parameter an underscore to avoid hashing

@st.cache_resource
def cache_index(_index):
I think then it only caches the output? Tbh not 100% sure. Every time I've used the underscore trick I had more parameters, so it might be basing the cache off of the other parameters then?
Add a reply
Sign up and join the conversation on Discord