MITB Banner

Python Guide To Lux: Interactive Visual Discovery

Lux is a Python package that aims to make data exploration easier and quicker with its simple one-line syntax and visualization recommendations.

Share

Data Exploration with Lux

A picture is worth a thousand words, even more so when it comes to data-centric projects. Data exploration is the first step in any machine learning project, and it is pivotal to how well the rest of the project turns out. Although libraries like Plotly and Seaborn provide a huge collection of plots and options, they require the user to first think about how the visualization should look like and what to visualize in the first place. This is not conducive to data exploration and just contributes to making it the most time-consuming part of the machine learning life cycle. Well, what if you could get visualizations recommended to you? Lux is a Python package created by the folks at RiseLabs that aims to make data exploration easier and quicker with its simple one-line syntax and visualization recommendations. As the developers put it “Lux is built on the philosophy that users should always be able to visualize anything they want without having to think about how the visualization should look like“. 

In Lux, you don’t explicitly create plots; you simply specify your analysis intent, i.e., what attributes/subset interest you; Lux takes care of the rest. Apart from this, Lux is tightly integrated with Pandas and can be used without modifying any code with just one import statement. It preserves the Pandas data frame semantics, so all the commands from the Pandas’s API work in Lux as expected.

Installation

Install Lux from PyPI

pip install lux-api

Install and activate the Lux notebook extension (lux-widget) included in the package.

For VsCode and Jupyter notebook
 jupyter nbextension install --py luxwidget
 jupyter nbextension enable --py luxwidget 
For JupyterLab
 jupyter labextension install @jupyter-widgets/jupyterlab-manager
 jupyter labextension install luxwidget 

Note: Lux does not work in Colab because Colab doesn’t support custom widgets yet. 

Check other methods of installation here.

Data Exploration with Lux

Enable Lux by importing it.

 import pandas as pd
 import lux 

That’s it. Now every time you print a data frame, you’ll get a toggle option to view the Lux visualizations. Let’s load some data and try this out. 

 df = pd.read_csv("https://raw.githubusercontent.com/Aditya1001001/English-Premier-League/master/EDA_data.csv")
 df 
The first set of recommendations created by Lux

This creates several plots divided into three tabs:

  • Correlation: Visualizes the relationships between two qualitative attributes. The plots are arranged from the highest to the lowest correlated pair of attributes.
  • Distribution: Shows histogram distributions of different quantitative attributes, ranked from the most to least skewed. 
  • Occurrence: Displays bar chart distributions of different categorical attributes, ranked from most to least uneven plots.  

In addition to simply visualizing the intermediate steps of data exploration Lux has a simple language for specifying your analysis intent, i.e., attributes and values you’re interested in. There are two ways of specifying intent in Lux:

Simple intent specification with intent

Provides simple string-based description to specify the intent of analysis conveniently.

Specifying attributes of interest

Let’s say value_eur is an attribute of interest:

 df.intent = ['value_eur']
 df 
recommendations created by Lux based on specified intent

Lux recommends a number of interesting plots in two tabs:

  • Enhance Tab: Enhance lets the user visualize the relationship between the specified attribute and different attributes. For example, a plot of  value_eur vs overall.
  • Filter Tab: It adds filters to the intended visualization, it lets the user quickly browse through subsets of the data. For example, the distribution plot for value_eur with Goals = 1

Another thing noted here is that Lux doesn’t simply create all possible plots; it determines the channel mappings and plot type based on a set of best practices.

If there are multiple attributes of interest, they can be mentioned in the form of a list. Let’s say we have two attributes of interest: overall and value_eur.

 df.intent = ['overall','value_eur']
 df 

This creates recommendations depicting the effect other attributes and filters have on the specified attributes. 

Plot recommendations for multiple attributes of interest

There is also a new tab called Generalize, it recommends plots with one of the specified attributes removed. 

Generalize tab of the recommendations for multiple attributes of interest
Specifying subset of the dataset via filters

Let’s say we are only interested in midfielders.

 df.intent = ["Position=Midfielder"]
 df 

This creates the same correlation, distribution, and occurrence plot as before but with only midfielder data.

recommendations created by Lux based on specified filter

Multiple values of interest can be specified by using the | notation. Let’s say we are interested in midfielders and defenders.

df.intent = ["Position=Midfielder|Defender"]

recommendations created by Lux based on specified filter with values

Advanced intents with lux.Clause

There’s only so much one can accomplish with string-based intent specifications, lux.Claus offers a more complex and expressive way of specifying intent. Additionally, it allows us to override auto-inferred details about the plots, such as the attribute’s default axis or the aggregation function used for the quantitative attributes.

The lux.Clause equivalent for specifying interest in overall would be: 

df.intent = [lux.Clause(attribute='overall')]

Let’s say that we want to create plots with overall on the y-axis.

df.intent = [lux.Clause(attribute='overall', channel='y')]

plot with axis swapped  with lux.Claus


Or want to use sum as the aggregation function instead of mean.

df.intent = ["value_eur",lux.Clause("overall",aggregation="sum")]

plot recommendations with aggregation function changed to sum with lux.Claus

Create individual visualizations with Vis objects

A Vis object indicates an individual visualization displayed in Lux. To generate a Vis, a source data frame and the intent of analysis are needed as inputs and this intent is expressed using the same intent specification as specified before using either intent or lux.Clause. For example, here, we describe our intent for visualizing the overall attribute on the dataframe df.

 from lux.vis.Vis import Vis
 intent = ["overall"]
 vis = Vis(intent,df)
 Vis 
Plot created by Vis

You can easily replace the Vis‘s data source and the query’s intent without changing its definition. For example, to represent the overall distribution on the subset of data with forwards with a bin size of 50.

 new_intent = [lux.Clause("overall",bin_size=50),"Position=Forward"]
 vis.set_intent(new_intent)
 vis 
Plot created by Vis with changed data source and intent

You can learn more about Vis here.

The visualizations can be stored as stand-alone HTML files. The default file name is export.html, you can optionally specify the HTML filename in the input parameter.

df.save_as_html('overall_vs_value.html')

Vis objects can also be exported to code in Altair or as Vega-Lite.

vis.to_Altair()

vis.to_VegaLite()

You can find more information about saving and exporting visualizations here.

Code for the above implementation is available in this Jupyter notebook.

References 

For a more in-depth understanding of Lux, see:

Share
Picture of Aditya Singh

Aditya Singh

A machine learning enthusiast with a knack for finding patterns. In my free time, I like to delve into the world of non-fiction books and video essays.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.