![]() ![]() DocArray is focused on active data that is subject to frequent change and allows efficient transfer between threads, processes and microservices. However, the biggest difference is that DocArray is focused on data in transit, whereas HF Datasets is about data at rest. In DocArray, there will also be a couple of feature releases soon to allow big data loading with constant memory consumption. One of the highlights is its efficient loading of large datasets, which is highly appreciated during training. Hugging Face datasets is a library for easily accessing and sharing datasets for NLP, computer vision, and audio tasks. It would be unfair to put them in the above list, so here is a dedicated section for them. There are three other packages that people often compare DocArray to, yet I haven’t used them extensively. ✅ Full support ✔ Limited support ❌ No support If you know how to Python, you know how to DocArray.ĭocArray is designed to maximize the local experience, with the requirement of cloud readiness at any time.ĭocArray is designed to represent multimodal data intuitively to face the ever-increasing development of multi/cross-modal applications. Design #ĭocArray consists of three simple concepts:ĭocument: a data structure for easily representing nested, unstructured data.ĭocumentArray: a container for efficiently accessing, processing, and understanding multiple Documents.ĭataclass: a high-level API for intuitively representing multimodal data.ĭocArray is designed to be extremely intuitive for Python users, no new syntax to learn. This is DocArray: a unique one, aiming to be your data structure for unstructured data. ![]() Its portable data structure can be wired in Protobuf, compressed bytes, JSON allowing your engineer friends to happily integrate it into the production system. If you are a deep learning engineer who works on scalable deep learning services, you should use DocArray: it can be the basic building block of your system. Torch, TensorFlow, ONNX, PaddlePaddle, JupyterLab, Google Colab. ![]() If you are a data scientist who works with image, text, video, audio data in Python all day, you should use DocArray: it can greatly accelerate the work on representing, embedding, matching, visualizing, evaluating, sharing data while staying close to your favorite toolkits, e.g. It’s like Protobuf, but for data scientists and deep learning engineers. It’s like pandas.DataFrame, but for nested and mixed media data with embeddings. It’s like numpy.ndarray, but for unstructured data. It’s like JSON, but for intensive computation. Toggle table of contents sidebar What is DocArray? # ![]()
0 Comments
Leave a Reply. |