Resource Guide
The Resource class is arguable the most important class of the whole Frictionless Framework. It's based on Data Resource Spec and Tabular Data Resource Spec
#
Creating ResourceLet's create a data resource:
As you can see it's possible to create a resource providing different kinds of sources which will be detector to have some type automatically (e.g. whether it's a descriptor or a path). It's possible to make this step more explicit:
#
Describing ResourceThe specs support a great deal of resource metadata which is possible to have with Frictionless Framework too:
If you have created a resource, for example, from a descriptor you can access this properties:
And edit them:
#
Saving DescriptorAs any of the Metadata classes the Resource class can be saved as JSON or YAML:
#
Resource LifecycleYou might have noticed that we had to duplicate the with Resource(...)
statement in some examples. The reason is that Resource is a streaming interface. Once it's read you need to open it again. Let's show it in an example:
At the same you can read data for a resource without opening and closing it explicitly. In this case Frictionless Framework will open and close the resource for you so it will be basically a one-time operation:
#
Reading DataThe Resource class is also a metadata class which provides various read and stream functions. The extract
functions always read rows into memory; Resource can do the same but it also gives a choice regarding output data. It can be rows
, data
, text
, or bytes
. Let's try reading all of them:
It's really handy to read all your data into memory but it's not always possible if a file is really big. For such cases, Frictionless provides streaming functions:
#
File DetailsLet's overview the details we can specify for a file. Usually you don't need to provide those details as Frictionless is capable to infer it on its own. Although, there are situation when you need to specify it manually. The following example will use the Resource
class but the same options can be used for the extract
and extract_table
functions.
#
SchemeThe scheme also know as protocol indicates which loader Frictionless should use to read or write data. It can be file
(default), text
, http
, https
, s3
, and others.
#
FormatThe format or as it's also called extension helps Frictionless to choose a proper parser to handle the file. Popular formats are csv
, xlsx
, json
and others
#
HashingThe hashing option controls which hashing algorithm should be used for generating the hash
property. It doesn't affect the extract
function but can be used with the Resource
class:
#
EncodingFrictionless automatically detects encoding of files but sometimes it can be inaccurate. It's possible to provide an encoding manually:
#
InnerpathBy default, Frictionless uses the first file found in a zip archive. It's possible to adjust this behaviour:
#
CompressionIt's possible to adjust compression detection by providing the algorithm explicitly. For the example below it's not required as it would be detected anyway:
#
ControlThe Control object allows you to manage the loader used by the Resource class. In most cases, you don't need to provide any Control settings but sometimes it can be useful:
Exact parameters depend on schemes and can be found in the "Schemes Reference". For example, the Remote Control provides http_timeout
, http_session
, and others but there is only one option available for all controls:
#
DialectThe Dialect adjusts the way the parsers work. The concept is similar to the Control above. Let's use the CSV Dialect to adjust the delimiter configuration:
There are a great deal of options available for different dialects that can be found in "Formats Reference". We will list the properties that can be used with every dialect:
#
Table DetailsThe core concepts for tabular resource are Layout and Schema.
#
LayoutPlease read Layout Guide for more information.
#
SchemaPlease read Schema Guide for more information.
#
StatsResource's stats can be accessed with resource.stats
:
#
Resource OptionsExtraction function and classes accepts a few options that are needed to manage integrity behaviour:
#
BasepathWill make all the paths treated as relative to this path.
#
DetectorDetector object to tweak metadata detection.
#
OnerrorThis option accept one of the three possible values configuring an extract
, Resou
, Resource
or Package
behaviour if there is an error during the row reading process:
- ignore (default)
- warn
- raise
Let's investigate how we can add warnings on all header/row errors:
In some cases, we need to fail on the first error. We will use raise
for it:
#
TrustedBy default an error will be reaised on unsafe paths. Setting trusted
to True
will disable this behaviour.