How to access EODATA using boto3 on CREODIAS
In this article you will learn how to access EODATA repository using Python library called boto3, running on Linux or Windows virtual machine within CREODIAS cloud.
What Are We Going To Cover
Installing boto3
How to execute scripts found in this article
Browsing EODATA
Downloading a single file from the EODATA repository
Prerequisites
No. 1 Account
You need a CREODIAS hosting account with access to the Horizon interface: https://horizon.cloudferro.com.
No. 2 A virtual machine
You need a virtual machine running on CREODIAS cloud. This article is written for the following operating systems:
Ubuntu 22.04
Windows Server 2019 on CF2 cloud and Windows Server 2022 on other clouds
Other operating systems might also work, but they are outside of scope of this article and might require adjusting of commands provided here.
Either way, your virtual machine needs to have access to the network which gives access to the EODATA repository. This network is either called eodata or has a name which starts with eodata_.
Linux VM
You can create a Linux virtual machine by following one of these articles:
Windows VM
To learn how to create a Windows virtual machine, see this article: How to create Windows VM on OpenStack Horizon and access it via web console on CREODIAS
If you are using the web console to access your Windows virtual machine, you can open the article you are currently reading in a web browser (like Microsoft Edge) installed on that virtual machine and copy the Python code to your chosen text editor.
No. 3 Python
You need Python installed on your virtual machine.
If you are using Linux, this article can help: How to install Python virtualenv or virtualenvwrapper on CREODIAS
And on Windows, you can follow this article: How to install Python in Windows on CREODIAS
No. 4 Obtained access and secret key
To access EODATA, you need to obtain your access and secret key. You can do it by following this article: How to get credentials used for accessing EODATA on a cloud VM on CREODIAS
No. 5 Basic knowledge about Python
boto3 is a Python library so you have to know your way around Python.
Installing boto3
Follow appropriate procedures on installing boto3:
Installing boto3 on Linux
If you are using Python environment like virtualenv, enter the environment in which you wish to install boto3. In it, execute the following command:
pip3 install boto3
You can also install the package globally:
sudo apt install python3-boto3
Installing boto3 on Windows
Follow this article to install boto3 on Windows: How to Install Boto3 in Windows on CREODIAS
How to execute scripts found in this article
The method of executing the scripts is different, depending on the operating system of your choice.
How to execute scripts using Linux command line
Open a text editor of your choice like nano or vim. Paste the script. Perform appropriate modifications to the code as instructed (like assigning values to variables). Save the file.
Once you have exited from the text editor, execute the python3 command followed by the name of your script from the directory it is in. For example:
python3 browse.py
The script should be executed.
How to execute scripts using Windows command prompt
Open a plain text editor (like Notepad). Paste the script. Perform appropriate modifications to the code as instructed (like assigning values to variables). Save the file with .py extension (make sure that Windows does not add .txt extension on top of it).
Open the command prompt (cmd.exe). Navigate to the directory in which the script is located using cd command, for example:
cd C:\Users\John\scripts
Execute the script using the python command followed by its name, for example:
python browse.py
Browsing EODATA
You can use boto3 to browse the EODATA repository. The code which allows you to achieve this goal can be found in this section.
Variable name |
What should be assigned to it |
access_key |
Your access key. Obtain it by following Prerequisite No. 4. |
secret_key |
Your secret key. Obtain it by following Prerequisite No. 4. |
directory |
The directory within EODATA repository which you want to explore. |
When filling in the variable directory, make sure to follow these rules:
Use slashes / as separators between elements of that path - directories and files
Do not start the path with a slash /
Since the element you are exploring is a directory, finish the path with a slash /
Start path with folder name found within the root directory of the EODATA repository (for example Sentinel-2 or Sentinel-5P)
If you want to explore the root directory of the EODATA repository, assign an empty string to variable directory:
directory=''
If you don’t have a directory which you want to explore but you want to simply test this method, you can leave the value which was assigned to variable directory in the example code below.
Variables host and container contain the EODATA endpoint and the name of the container used, respectively. You do not need to modify them.
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'
host='http://data.cloudferro.com'
container='DIAS'
s3=boto3.client('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key,endpoint_url=host)
print(s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes'])
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'
host='http://eodata.cloudferro.com'
container='DIAS'
s3=boto3.client('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key,endpoint_url=host)
print(s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes'])
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'
host='http://eodata.cloudferro.com'
container='DIAS'
s3=boto3.client('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key,endpoint_url=host)
print(s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes'])
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'
host='http://data.cloudferro.com'
container='DIAS'
s3=boto3.client('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key,endpoint_url=host)
print(s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes'])
If you provided your access and secret keys but did not modify the variable directory, the code above will list products found in Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ directory of the EODATA repository. In that case, the output should look like this:
[{'Prefix': 'Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110329_000000603113_00267_52867_0000.N1/'}, {'Prefix': 'Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110428_000000603113_00267_52867_0000.N1/'}, {'Prefix': 'Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110446_000000603113_00267_52867_0000.N1/'}]
This output can be described as a “list of dictionaries”. Each of those dictionaries contains a key called Prefix, providing the path to a file or directory. Instead of printing this list like above, you can loop through it to increase the legibility of the output:
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'
host='http://data.cloudferro.com'
container='DIAS'
s3=boto3.client('s3',aws_access_key_id=access_key,aws_secret_access_key=secret_key,endpoint_url=host)
for i in s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes']:
print(i['Prefix'])
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'
host='http://eodata.cloudferro.com'
container='DIAS'
s3=boto3.client('s3',aws_access_key_id=access_key,aws_secret_access_key=secret_key,endpoint_url=host)
for i in s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes']:
print(i['Prefix'])
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'
host='http://eodata.cloudferro.com'
container='DIAS'
s3=boto3.client('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key,endpoint_url=host)
for i in s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes']:
print(i['Prefix'])
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'
host='http://data.cloudferro.com'
container='DIAS'
s3=boto3.client('s3',aws_access_key_id=access_key,aws_secret_access_key=secret_key,endpoint_url=host)
for i in s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes']:
print(i['Prefix'])
This time, the output should show only the paths:
Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110329_000000603113_00267_52867_0000.N1/
Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110428_000000603113_00267_52867_0000.N1/
Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110446_000000603113_00267_52867_0000.N1/
Downloading a single file from the EODATA repository
This section covers how to download a file from the EODATA repository.
The script below should download that file to a directory from which the script is being executed. If that directory already contains a file which has the same name as the one you are downloading, it will be overwritten without prompt for confirmation.
In code below, replace the following variables:
Variable name |
What should be assigned to it |
access_key |
Your access key. Obtain it by following Prerequisite No. 4. |
secret_key |
Your secret key. Obtain it by following Prerequisite No. 4. |
key |
Full path (including folders) of a file you want to download from EODATA repository. |
When filling in variable key, make sure to follow these rules:
Use slashes / as separators between elements of that path - directories and files
Do not start or finish the path with slash /
Start path with the name of the folder found within the root directory of the EODATA repository (for example Sentinel-2 or Sentinel-5P)
If you don’t have a file which you want to download but you simply want to test this method of downloading files, you can leave the value which was assigned to variable key in example code below.
Again, variable host and container contain the EODATA endpoint and the name of the container being used, respectively. You do not need to modify them.
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
key='Landsat-5/TM/L1T/2011/11/16/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1.BP.PNG'
host='http://data.cloudferro.com'
container='DIAS'
s3=boto3.resource('s3',aws_access_key_id=access_key,aws_secret_access_key=secret_key, endpoint_url=host,)
bucket=s3.Bucket(container)
filename=key.split("/")[-1]
bucket.download_file(key, filename)
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
key='Landsat-5/TM/L1T/2011/11/16/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1.BP.PNG'
host='http://eodata.cloudferro.com'
container='DIAS'
s3=boto3.resource('s3',aws_access_key_id=access_key,aws_secret_access_key=secret_key, endpoint_url=host,)
bucket=s3.Bucket(container)
filename=key.split("/")[-1]
bucket.download_file(key, filename)
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
key='Landsat-5/TM/L1T/2011/11/16/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1.BP.PNG'
host='http://eodata.cloudferro.com'
container='DIAS'
s3=boto3.resource('s3',aws_access_key_id=access_key,aws_secret_access_key=secret_key, endpoint_url=host,)
bucket=s3.Bucket(container)
filename=key.split("/")[-1]
bucket.download_file(key, filename)
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
key='Landsat-5/TM/L1T/2011/11/16/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1.BP.PNG'
host='http://data.cloudferro.com'
container='DIAS'
s3=boto3.resource('s3',aws_access_key_id=access_key,aws_secret_access_key=secret_key, endpoint_url=host,)
bucket=s3.Bucket(container)
filename=key.split("/")[-1]
bucket.download_file(key, filename)
If provided your access key and secret key but you did not change the contents of variable key, the code should download the file called
LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1.BP.PNG
which is located within the root directory of product
LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1
After executing the script, the output should be empty. Regardless, the downloaded file should be visible within the directory from which the script was executed. For example, this is how it will look like on Linux:
What To Do Next
You can further modify these scripts so that they better suit your needs, or integrate them with your own applications. These scripts might also work in other development environments. This is outside of scope of this article.
boto3 can also be used to access object storage containers from CREODIAS cloud: How to access object storage from CREODIAS using boto3