Introduction#
ProgressiVis is a system and language implementing progressive data analysis and visualization.
In the ProgressiVis
language, all the executions are progressive by
design; the system is never blocked performing lengthy operations, it
always shows visualizations quickly, even when all the data is not
loaded yet. ProgressiVis
also comes with extensions in the notebook
to create interactive visualizations and their user interfaces for
controlling the progressive process.
When visualizing the results of computations, the visualizations are shown, updated, and improved progressively, every few seconds, until the final result is computed. Alternatively, the user can abort the current computation and try a new one or several, if the current one does not converge to the expected result.
Why?#
Interactive data exploration is performed by humans and therefore
requires a controlled latency. When the latency exceeds 10s, humans
cannot maintain their attention and their efficiency at exploring data
drops dramatically. Instead of loading data fully or running
algorithms to completion one after the other, as done in all existing
scientific analysis systems, ProgressiVis
algorithms (called modules)
run in short batches, each batch being only allowed to run for a
specific quantum of time - typically 1 second - producing a usable
result in the end, and yielding control to the next module. To
perform the whole computation, ProgressiVis
loops over the modules as
many times as necessary to converge to a result that the analyst
considers satisfactory.
Humans can then conduct interactive data exploration using large datasets and powerful analysis algorithms, trading time with quality, staying in control of the quality they need to make decisions by controlling the time they will allow the algorithm to run.
ProgressiVis
relies on well known Python libraries, such as
numpy,scipy,
Pandas,
pyarrow,
and
Scikit-Learn.
For now, ProgressiVis
is mostly a proof of concept. The current
implementation provides progressive data structures that can grow and
adapt to progressive computations, an execution model relying on
asynchronous programming with non-preemptive cooperative
multitasking,
and a number of modules implementing various components of the
language: data manipulation, computations, statistics, machine
learning, and vizualization. ProgressiVis is also meant to be used
from jupyter notebooks; it provides interactions and visualizations
specially suited to progressive systems.