Object/File Storage public REST API

Romain Rigaux
Data Querying
Published in
5 min readAug 13, 2021

--

Leverage a REST API to simplify your data files interactions like list, upload, download in the public object storage Clouds.

Same file operations as in the Web App available as REST API calls

This post comes with a live tutorial of the Hue file listing API via the demo environment demo.gethue.com.

Background: the Hue SQL Editor project has been evolving for more than 10 years and allows you to query any Database or Data Warehouse.

Recently: like previously described in the SQL Editor API post, all the end user functionalities and under the cover grunt work of integration can now be simply reused programmatically (freeing up time to let you focus on the data work itself instead).

The main use cases for the File API is to upload data and create an SQL Table on top of them or retrieve those pesky file URIs:

Quick Path copy or open file in the Create Table Wizard

The API leverages the standard credentials of your users (SSO via LDAP, SAML…) and is the same as if they were interacting via the Web UI directly. In bonus, it is cloud agnostic so nobody is required to learn about the intricacies of each provider, and simply use an interface they are already familiar with.

API Demo

The simplest operation is to list the content of your buckets or directories (aka known as “list dir”).

Start by authenticating and asking for an API access token (also known as JWT):

curl -X POST https://demo.gethue.com/api/token/auth -d 'username=demo&password=demo'{"refresh":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoicmVmcmVzaCIsImV4cCI6MTYyOTQ3MTE0MiwianRpIjoiYjNkMDUzN2I1OGU5NDNlZGE0OTJiYzVmOTkzMDEwOTEiLCJ1c2VyX2lkIjoyfQ._MXo09PzisvqY7-1NMVIaLiUCVksYx2ZA5v_PWTk0TY","access":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc"}

Then provide this access value in each following calls. In your case, update the examples below with your own:

Authorization: Bearer <Your "access" value here>

Here is how to list the content of a path, here the S3 bucket s3a://demo-gethue:

curl -X GET https://demo.gethue.com/api/storage/view=s3a://demo-gethue -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc"{
"path": "s3a://demo-gethue",
"breadcrumbs": [
{
"url": "s3a%3A%2F%2F",
"label": "s3a://"
},
{
"url": "s3a%3A%2F%2Fdemo-gethue",
"label": "demo-gethue"
}
],
"current_request_path": "/filebrowser/view=s3a%3A%2F%2Fdemo-gethue",
"is_trash_enabled": false,
"files": [
{
"path": "s3a://",
"name": "..",
"stats": {
"path": "s3a://",
"size": 0,
"atime": null,
"mtime": null,
"mode": 16895,
"user": "",
"group": "",
"aclBit": false
},
"mtime": "",
"humansize": "0 bytes",
"type": "dir",
"rwx": "drwxrwxrwx",
"mode": "40777",
"url": "/filebrowser/view=s3a%3A%2F%2F",
"is_sentry_managed": false
},
{
"path": "s3a://demo-gethue",
"name": ".",
"stats": {
"path": "s3a://demo-gethue",
"size": 0,
"atime": 1628866612,
"mtime": 1628866612,
"mode": 16895,
"user": "",
"group": "",
"aclBit": false
},
"mtime": "August 13, 2021 02:56 PM",
"humansize": "0 bytes",
"type": "dir",
"rwx": "drwxrwxrwx",
"mode": "40777",
"url": "/filebrowser/view=s3a%3A%2F%2Fdemo-gethue",
"is_sentry_managed": false
},
{
"path": "s3a://demo-gethue/data",
"name": "data",
"stats": {
"path": "s3a://demo-gethue/data/",
"size": 0,
"atime": null,
"mtime": null,
"mode": 16895,
"user": "",
"group": "",
"aclBit": false
},
"mtime": "",
"humansize": "0 bytes",
"type": "dir",
"rwx": "drwxrwxrwx",
"mode": "40777",
"url": "/filebrowser/view=s3a%3A%2F%2Fdemo-gethue%2Fdata",
"is_sentry_managed": false
}
],
"page": {
"number": 1,
"num_pages": 1,
"previous_page_number": 0,
"next_page_number": 0,
"start_index": 1,
"end_index": 1,
"total_count": 1
},
"pagesize": 30,
"home_directory": null,
"descending": null,
"cwd_set": true,
"file_filter": "any",
"current_dir_path": "s3a://demo-gethue",
"is_fs_superuser": false,
"groups": [],
"users": [],
"superuser": null,
"supergroup": null,
"is_sentry_managed": false,
"apps": [
"filebrowser",
"metastore",
"useradmin",
"indexer",
"notebook"
],
"show_download_button": true,
"show_upload_button": true,
"is_embeddable": false,
"s3_listing_not_allowed": ""
}

Some of the parameters:

  • pagesize=45 (number of items to return)
  • pagenum=1 (pagination)
  • filter=file names text to match, can be empty
  • sortby=name (field to use for sorting)
  • descending=false (keep sorting alphabetical)

e.g. pagesize=45&pagenum=1&filter=&sortby=name&descending=false

Then peek at the data of the s3a://demo-gethue/data/web_logs/index_data.csv file:

curl -X GET https://demo.gethue.com/api/storage/view=s3a://demo-gethue/data/web_logs/index_data.csv -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc"{
"show_download_button": true,
"is_embeddable": false,
"editable": false,
"mtime": "October 31, 2016 03:34 PM",
"rwx": "-rw-rw-rw-",
"path": "s3a://demo-gethue/data/web_logs/index_data.csv",
"stats": {
"size": 6199593,
"aclBit": false,
...............
"contents": "code,protocol,request,app,user_agent_major,region_code,country_code,id,city,subapp,latitude,method,client_ip, user_agent_family,bytes,referer,country_name,extension,url,os_major,longitude,device_family,record,user_agent,time,os_family,country_code3
200,HTTP/1.1,GET /metastore/table/default/sample_07 HTTP/1.1,metastore,,00,SG,8836e6ce-9a21-449f-a372-9e57641389b3,Singapore,table,1.2931000000000097,GET,128.199.234.236,Other,1041,-,Singapore,,/metastore/table/default/sample_07,,103.85579999999999,Other,"demo.gethue.com:80 128.199.234.236 - - [04/May/2014:06:35:49 +0000] ""GET /metastore/table/default/sample_07 HTTP/1.1"" 200 1041 ""-"" ""Mozilla/5.0 (compatible; phpservermon/3.0.1; +http://www.phpservermonitor.org)""
",Mozilla/5.0 (compatible; phpservermon/3.0.1; +http://www.phpservermonitor.org),2014-05-04T06:35:49Z,Other,SGP
200,HTTP/1.1,GET /metastore/table/default/sample_07 HTTP/1.1,metastore,,00,SG,6ddf6e38-7b83-423c-8873-39842dca2dbb,Singapore,table,1.2931000000000097,GET,128.199.234.236,Other,1041,-,Singapore,,/metastore/table/default/sample_07,,103.85579999999999,Other,"demo.gethue.com:80 128.199.234.236 - - [04/May/2014:06:35:50 +0000] ""GET /metastore/table/default/sample_07 HTTP/1.1"" 200 1041 ""-"" ""Mozilla/5.0 (compatible; phpservermon/3.0.1; +http://www.phpservermonitor.org)""
",Mozilla/5.0 (compatible; phpservermon/3.0.1; +http://www.phpservermonitor.org),2014-05-04T06:35:50Z,Other,SGP
...............
}

Some of the parameters:

  • offset=0
  • length=204800
  • compression=none
  • mode=text

e.g. ?offset=0&length=204800&compression=none&mode=text

And then decide to download it:

curl -X GET https://demo.gethue.com/api/storage/download=s3a://demo-gethue/data/web_logs/index_data.csv -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc"

It is also possible to upload your data directly (if you have the proper write permissions in the remote destination folder).

Here we send the local file README.md to the remotes3a://demo-gethue/web_log_data/ directory:

curl -X POST https://demo.gethue.com/api/storage/upload/file?dest=s3a://demo-gethue/web_log_data/ --form hdfs_file=@README.md

Note: the hdfs_file parameter is a relative or absolute path to a local file. The name is confusing currently, it should be read more like local_file (i.e. not related to HDFS only)

Then what?

When the data is stored in the cloud, it becomes easy to create a SQL table and query it. One way it to open up the File Browser and copy the path of the data into a CREATE TABLE statement or just go via the Create table wizard which will do all the work for you.

Note that small data files don’t even need to go via the cloud storage and can be directly uploaded via drag & drop in the Web interface or Importer API. Something that will be demoed next time, so stay tuned!

Directly uploading a file and getting a SQL table ready to query

Proper security

It is also a good timing. The file listing (for HDFS, the Hadoop file system) has be present since day one. Later on AWS S3, Azure Storage, Google Cloud Storage (beta) have been added but were lacking fine grained security (i.e. all the users were using the same credentials, so not good).

This is not true anymore as recently the shared signed URL technology of these cloud storages is being leveraged under the hood to have each user perform file operations under their own distinct credentials. This allows true self service instead of restricting data uploads to only admin. Users can be trusted and upload their own files and analyze them without contacting anybody else. Another bottleneck removed!

If interested in more technical details, read more about AWS Shared Signature or Azure Signed URLs.

Open the Create Table Wizard or copy a file URI
Hue or Compose app contacting a middleware service that converts raw calls to object storages into custom signed URLs in order to provide fine grained authorization

Sum-up

Now there is no excuses to not be data driven and provide self service analytics to your hungry users ;)

Using GCP or other storages? Let us know!

And in case you missed it, the coolest API is actually the Execute a SQL query, play with it!

Onwards!

Romain

--

--