Last updated March 18, 2024
In AI Mysteries

Python Guide To Lux: Interactive Visual Discovery

Lux is a Python package that aims to make data exploration easier and quicker with its simple one-line syntax and visualization recommendations.

Share

Published on March 28, 2021

by Aditya Singh

A picture is worth a thousand words, even more so when it comes to data-centric projects. Data exploration is the first step in any machine learning project, and it is pivotal to how well the rest of the project turns out. Although libraries like Plotly and Seaborn provide a huge collection of plots and options, they require the user to first think about how the visualization should look like and what to visualize in the first place. This is not conducive to data exploration and just contributes to making it the most time-consuming part of the machine learning life cycle. Well, what if you could get visualizations recommended to you? Lux is a Python package created by the folks at RiseLabs that aims to make data exploration easier and quicker with its simple one-line syntax and visualization recommendations. As the developers put it “Lux is built on the philosophy that users should always be able to visualize anything they want without having to think about how the visualization should look like“.

In Lux, you don’t explicitly create plots; you simply specify your analysis intent, i.e., what attributes/subset interest you; Lux takes care of the rest. Apart from this, Lux is tightly integrated with Pandas and can be used without modifying any code with just one import statement. It preserves the Pandas data frame semantics, so all the commands from the Pandas’s API work in Lux as expected.

Installation

Install Lux from PyPI

pip install lux-api

Install and activate the Lux notebook extension (lux-widget) included in the package.

For VsCode and Jupyter notebook

 jupyter nbextension install --py luxwidget
 jupyter nbextension enable --py luxwidget

For JupyterLab

 jupyter labextension install @jupyter-widgets/jupyterlab-manager
 jupyter labextension install luxwidget

Note: Lux does not work in Colab because Colab doesn’t support custom widgets yet.

Check other methods of installation here.

Data Exploration with Lux

Enable Lux by importing it.

 import pandas as pd
 import lux

That’s it. Now every time you print a data frame, you’ll get a toggle option to view the Lux visualizations. Let’s load some data and try this out.

 df = pd.read_csv("https://raw.githubusercontent.com/Aditya1001001/English-Premier-League/master/EDA_data.csv")
 df

The first set of recommendations created by Lux

This creates several plots divided into three tabs:

Correlation: Visualizes the relationships between two qualitative attributes. The plots are arranged from the highest to the lowest correlated pair of attributes.
Distribution: Shows histogram distributions of different quantitative attributes, ranked from the most to least skewed.
Occurrence: Displays bar chart distributions of different categorical attributes, ranked from most to least uneven plots.

In addition to simply visualizing the intermediate steps of data exploration Lux has a simple language for specifying your analysis intent, i.e., attributes and values you’re interested in. There are two ways of specifying intent in Lux:

Using the intent property of data frames.
Through the lux.Clause object

Simple intent specification with intent

Provides simple string-based description to specify the intent of analysis conveniently.

Specifying attributes of interest

Let’s say value_eur is an attribute of interest:

 df.intent = ['value_eur']
 df

recommendations created by Lux based on specified intent

Lux recommends a number of interesting plots in two tabs:

Enhance Tab: Enhance lets the user visualize the relationship between the specified attribute and different attributes. For example, a plot of value_eur vs overall.
Filter Tab: It adds filters to the intended visualization, it lets the user quickly browse through subsets of the data. For example, the distribution plot for value_eur with Goals = 1.

Another thing noted here is that Lux doesn’t simply create all possible plots; it determines the channel mappings and plot type based on a set of best practices.

If there are multiple attributes of interest, they can be mentioned in the form of a list. Let’s say we have two attributes of interest: overall and value_eur.

 df.intent = ['overall','value_eur']
 df

This creates recommendations depicting the effect other attributes and filters have on the specified attributes.

Plot recommendations for multiple attributes of interest

There is also a new tab called Generalize, it recommends plots with one of the specified attributes removed.

Generalize tab of the recommendations for multiple attributes of interest

Specifying subset of the dataset via filters

Let’s say we are only interested in midfielders.

 df.intent = ["Position=Midfielder"]
 df

This creates the same correlation, distribution, and occurrence plot as before but with only midfielder data.

recommendations created by Lux based on specified filter

Multiple values of interest can be specified by using the | notation. Let’s say we are interested in midfielders and defenders.

df.intent = ["Position=Midfielder|Defender"]

recommendations created by Lux based on specified filter with values

Advanced intents with `lux.Clause`

There’s only so much one can accomplish with string-based intent specifications, lux.Claus offers a more complex and expressive way of specifying intent. Additionally, it allows us to override auto-inferred details about the plots, such as the attribute’s default axis or the aggregation function used for the quantitative attributes.

The lux.Clause equivalent for specifying interest in overall would be:

df.intent = [lux.Clause(attribute='overall')]

Let’s say that we want to create plots with overall on the y-axis.

df.intent = [lux.Clause(attribute='overall', channel='y')]

Or want to use sum as the aggregation function instead of mean.

df.intent = ["value_eur",lux.Clause("overall",aggregation="sum")]

plot recommendations with aggregation function changed to sum with lux.Claus

Create individual visualizations with `Vis` objects

A Vis object indicates an individual visualization displayed in Lux. To generate a Vis, a source data frame and the intent of analysis are needed as inputs and this intent is expressed using the same intent specification as specified before using either intent or lux.Clause. For example, here, we describe our intent for visualizing the overall attribute on the dataframe df.

 from lux.vis.Vis import Vis
 intent = ["overall"]
 vis = Vis(intent,df)
 Vis

You can easily replace the Vis‘s data source and the query’s intent without changing its definition. For example, to represent the overall distribution on the subset of data with forwards with a bin size of 50.

 new_intent = [lux.Clause("overall",bin_size=50),"Position=Forward"]
 vis.set_intent(new_intent)
 vis

Plot created by Vis with changed data source and intent

You can learn more about Vis here.

The visualizations can be stored as stand-alone HTML files. The default file name is export.html, you can optionally specify the HTML filename in the input parameter.

df.save_as_html('overall_vs_value.html')

Vis objects can also be exported to code in Altair or as Vega-Lite.

vis.to_Altair()

vis.to_VegaLite()

You can find more information about saving and exporting visualizations here.

Code for the above implementation is available in this Jupyter notebook.