Quick Start

Let's get started with Frictionless! We will learn how to install and use the framework. The simple example below will showcase the framework's basic functionality. For an introduction to the concepts behind the Frictionless Framework, please read the Frictionless Introduction.

Installation#

The framework requires Python3.6+. Versioning follows the SemVer Standard.

CLI

pip install frictionless
pip install frictionless[sql] # to install a core plugin (optional)
pip install 'frictionless[sql]' # for zsh shell

The framework supports CSV, Excel, and JSON formats by default. The second command above installs a plugin for SQL support. There are plugins for SQL, Pandas, HTML, and others (check the list of Frictionless Framework plugins and their status). Usually, you don't need to think about it in advance–frictionless will display a useful error message about a missing plugin with installation instructions.

Troubleshooting#

Did you have an error installing Frictionless? Here are some dependencies and common errors:

pip: command not found. Please see the pip docs for help installing pip.
Installing Python help (Mac)
Installing Python help (Windows)

Still having a problem? Ask us for help on our Discord chat or open an issue. We're happy to help!

Usage#

The framework can be used:

as a Python library
as a command-line interface
as a restful API server (for advanced use cases)

For instance, all the examples below do the same thing:

CLI
Python
API

frictionless extract data/table.csv

All these interfaces are as much alike as possible regarding naming conventions and the way you interact with them. Usually, it's straightforward to translate, for instance, Python code to a command-line call. Frictionless provides code completion for Python and the command-line, which should help to get useful hints in real time. You can find the API reference here.

Arguments conform to the following naming convention:

for Python interfaces, they are snake_cased, e.g. missing_values
within dictionaries or JSON objects, they are camelCased, e.g. missingValues
in the command line, they use dashes, e.g. --missing-values

To get the documentation for a command-line interface just use the --help flag:

CLI

frictionless --help
frictionless describe --help
frictionless extract --help
frictionless validate --help
frictionless transform --help

Example#

Download invalid.csv to reproduce the examples (right-click and "Save link as"). For more examples, please take a look at the Basic Examples article.

We will take a very messy data file:

CLI
Python

cat invalid.csv

invalid.csv

id,name,,name
1,english
1,english

2,german,1,2,3

First of all, let's use describe to infer the metadata directly from the tabular data. We can then edit and save it to provide others with useful information about the data:

CLI
Python

frictionless describe invalid.csv

# --------
# metadata: invalid.csv
# --------

encoding: utf-8
format: csv
hashing: md5
name: invalid
path: invalid.csv
profile: tabular-data-resource
schema:
  fields:
    - name: id
      type: integer
    - name: name
      type: string
    - name: field3
      type: integer
    - name: name2
      type: integer
scheme: file

This output is in YAML, it is a default Frictionless output format.

Now that we have inferred a table schema from the data file (e.g., expected format of the table, expected type of each value in a column, etc.), we can use extract to read the normalized tabular data from the source CSV file:

CLI
Python

frictionless extract invalid.csv

# ----
# data: invalid.csv
# ----

==  =======  ======  =====
id  name     field3  name2
==  =======  ======  =====
 1  english
 1  english

 2  german        1      2
==  =======  ======  =====

Last but not least, let's get a validation report. This report will help us to identify and fix all the errors present in the tabular data, as comprehensive information is provided for every problem:

CLI
Python

CLI

frictionless validate invalid.csv

# -------
# invalid: invalid.csv
# -------

===  =====  ===============  ====================================================================================
row  field  code             message
===  =====  ===============  ====================================================================================
         3  blank-label      Label in the header in field at position "3" is blank
         4  duplicate-label  Label "name" in the header at position "4" is duplicated to a label: at position "2"
  2      3  missing-cell     Row at position "2" has a missing cell in field "field3" at position "3"
  2      4  missing-cell     Row at position "2" has a missing cell in field "name2" at position "4"
  3      3  missing-cell     Row at position "3" has a missing cell in field "field3" at position "3"
  3      4  missing-cell     Row at position "3" has a missing cell in field "name2" at position "4"
  4         blank-row        Row at position "4" is completely blank
  5      5  extra-cell       Row at position "5" has an extra value in field at position "5"
===  =====  ===============  ====================================================================================

Now that we have all this information:

we can clean up the table to ensure the data quality
we can use the metadata to describe and share the dataset
we can include the validation into our workflow to guarantee the validity
and much more: don't hesitate and read the following sections of the documentation!