Importing data
This section shows how to import data from various sources into the Engine.
Python users: Saving dataframes from Pandas and Dask into a dataset
If you intend to load data into pandas
or dask
, process them, and save them in a tabular file format to later import into the Engine, make sure that row indexes that contain useful data are saved, and unnecessary indexes are skipped:
import pandas as pd
data: pd.DataFrame = pd.read_csv('my_file.csv') # Or read_json(..., lines=True) or read_parquet(...)
# You process your data further, and finalize
final_data = my_processing_pipeline(final_data)
# If the index in the final data is not a range index and contains useful
# information such as timestamp or group names, use reset_index to convert them into columns
final_data = final_data.reset_index(names_of_indexes_you_want_to_retain)
# Then discard the row indexes that do not contain useful data
final_data = final_data.reset_index(drop=True)
# The resulting data now has a range index, which contains no useful data and
# if kept results in an unnamed column. Hence, use the option to skip it before
# saving to appropriate format(s) for importing:
final_data.to_csv('data_to_import.csv', index=False)
final_data.to_json('data_to_import.jsonl', index=False, orient='records', lines=True)
final_data.to_parquet('data_to_import.parquet', index=False)
Before you begin — Workarounds
Due to certain known issues and limitations with the current release, some datasources will need a few offline workarounds before you can import them with the Engine. Immediate future releases will aim to eliminate this extra work required from users. This section details what specific workarounds you can use in each case, if your dataset falls into these categories.
Compressed files (ending in .zip
, .gz
, .bz2
or .xz
)
The Engine currently does not support importing tabular files (csv
or jsonl
) stored in compressed formats. You will need to decompress them offline before importing.
SAS (.sas7bdat
), STATA (.dta
), and SPSS (.sav
, .zsav
, .por
) formats
The Engine currently does not support importing files in sas, or spss format.
Save all such files into .csv
format. If you are comfortable writing small scripts in the R/Python programming languages, you can convert to .csv
using one of the fofllowing options:
- Pandas functions
read_sas
,read_spss
,read_stata
- The R packages
readr
,readxl
, andhaven
If you have got data in excel format, first save each sheet as separate .csv
files. Then upload each sheet as a separate dataset.
Nested jsonl
and jsonlines
files
If you intend to ingest nested JSON lines files into tabular data, you will need to unnest them yourself. Use an appropriate tool to perform this offline:
- If you are importing from Mongo DB, make another collection in your database and import from it: use an aggregation pipeline with the
$unwind
aggregation stage, coupled with necessary aggregation operators such as$arrayToObject
and$objectToArray
. - If you have a local jsonlines file with nested data and are familiar with
pandas
inpython
, use the JSON normalization functionality from pandas.
Import data from local files
Import data from CSV data files
from aiaengine import Org, Project, FileSource, Column, DataType
# create a new demo project in the org
org = Org(id='b6240512-cd17-43a0-8297-84c51c1bc5a0') # replace with your org ID
project = org.create_project(name="Demo project using Python SDK", description="Your demo project")
# or you can get an existing project that you want to work on
# project = Project(id='ID_of_your_project') # replace with your own project ID
# import the `German Credit Data` dataset
data_file = 'examples/datasets/german-credit.csv'
# You can use the `print_schema` utility function to print the auto-inferred schema
# print_schema(pd.read_csv(data_file, header=0))
dataset = project.create_dataset(
name=f"German Credit Data",
data_source=FileSource(
file_urls=[data_file],
schema=[
Column('checking_status', DataType.Text),
Column('duration', DataType.Numeric),
Column('credit_history', DataType.Text),
Column('purpose', DataType.Text),
Column('credit_amount', DataType.Numeric),
Column('savings_status', DataType.Text),
Column('employment', DataType.Text),
Column('installment_commitment', DataType.Numeric),
Column('personal_status', DataType.Text),
Column('other_parties', DataType.Text),
Column('residence_since', DataType.Numeric),
Column('property_magnitude', DataType.Text),
Column('age', DataType.Numeric),
Column('other_payment_plans', DataType.Text),
Column('housing', DataType.Text),
Column('existing_credits', DataType.Numeric),
Column('job', DataType.Text),
Column('num_dependents', DataType.Numeric),
Column('own_telephone', DataType.Text),
Column('foreign_worker', DataType.Text),
Column('class', DataType.Text)
]
)
)
print(dataset.id)
package com.aiaengine.examples.dataset;
import com.aiaengine.Dataset;
import com.aiaengine.Engine;
import com.aiaengine.Org;
import com.aiaengine.Project;
import com.aiaengine.datasource.DataSource;
import com.aiaengine.datasource.Schema;
import com.aiaengine.datasource.file.CSVFileSettings;
import com.aiaengine.datasource.file.FileSourceRequest;
import com.aiaengine.datasource.file.FileType;
import com.aiaengine.org.request.CreateProjectRequest;
import com.aiaengine.project.request.CreateDatasetRequest;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
public class ImportCsvApp {
public static void main(String[] args) throws FileNotFoundException {
Engine engine = new Engine();
// create a new demo project in the org
Org org = engine.getOrg("cae24b10-e6b0-4d61-8cef-a9f4b8f6133d"); // replace with your org ID
Project project = org.createProject(CreateProjectRequest.builder()
.name("Demo project using Java SDK")
.description("Your demo project")
.build());
// or you can get an existing project that you want to work on
// Project project = engine.getProject("ID_of_your_project") // replace with your own project ID
String dataFilePath = "examples/datasets/german-credit.csv";
List<Schema.Column> columns = new ArrayList<>();
columns.add(new Schema.Column("checking_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("duration", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("credit_history", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("purpose", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("credit_amount", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("savings_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("employment", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("installment_commitment", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("personal_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("other_parties", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("residence_since", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("property_magnitude", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("age", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("other_payment_plans", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("housing", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("existing_credits", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("job", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("num_dependents", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("own_telephone", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("foreign_worker", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("class", Schema.SemanticType.TEXT));
DataSource localDataSource = engine.buildFileSource(FileSourceRequest.builder()
.fileType(FileType.CSV)
.url(dataFilePath)
.fileSettings(new CSVFileSettings())
.schema(new Schema(columns))
.build());
Dataset dataset = project.createDataset(CreateDatasetRequest.builder()
.name("German Credit Data")
.dataSource(localDataSource)
.timeout(900)
.build());
System.out.println(dataset.getId());
}
}
Import from Excel file
from argparse import FileType
from aiaengine import Org, Project, FileSource, Column, FileType, DataType, ExcelSettings
# create a new demo project in the org
org = Org(id='b6240512-cd17-43a0-8297-84c51c1bc5a0') # replace with your org ID
project = org.create_project(name="Demo project using Python SDK", description="Your demo project")
# or you can get an existing project that you want to work on
# project = Project(id='ID_of_your_project') # replace with your own project ID
# import the `German Credit Data` dataset
data_file = 'examples/datasets/german-credit.xlsx'
# You can use the `print_schema` utility function to print the auto-inferred schema
# print_schema(pd.read_excel(data_file, header=0))
dataset = project.create_dataset(
name=f"German Credit Data (Excel)",
data_source=FileSource(
file_urls=[data_file],
file_type=FileType.Excel,
file_settings=ExcelSettings(
data_range='A1:U1001'
),
schema=[
Column('checking_status', DataType.Text),
Column('duration', DataType.Numeric),
Column('credit_history', DataType.Text),
Column('purpose', DataType.Text),
Column('credit_amount', DataType.Numeric),
Column('savings_status', DataType.Text),
Column('employment', DataType.Text),
Column('installment_commitment', DataType.Numeric),
Column('personal_status', DataType.Text),
Column('other_parties', DataType.Text),
Column('residence_since', DataType.Numeric),
Column('property_magnitude', DataType.Text),
Column('age', DataType.Numeric),
Column('other_payment_plans', DataType.Text),
Column('housing', DataType.Text),
Column('existing_credits', DataType.Numeric),
Column('job', DataType.Text),
Column('num_dependents', DataType.Numeric),
Column('own_telephone', DataType.Text),
Column('foreign_worker', DataType.Text),
Column('class', DataType.Text)
]
)
)
print(dataset.id)
package com.aiaengine.examples.dataset;
import com.aiaengine.Dataset;
import com.aiaengine.Engine;
import com.aiaengine.Org;
import com.aiaengine.Project;
import com.aiaengine.datasource.DataSource;
import com.aiaengine.datasource.Schema;
import com.aiaengine.datasource.file.ExcelFileSettings;
import com.aiaengine.datasource.file.FileSourceRequest;
import com.aiaengine.datasource.file.FileType;
import com.aiaengine.org.request.CreateProjectRequest;
import com.aiaengine.project.request.CreateDatasetRequest;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
public class ImportExcelApp {
public static void main(String[] args) throws FileNotFoundException {
Engine engine = new Engine();
// create a new demo project in the org
Org org = engine.getOrg("cae24b10-e6b0-4d61-8cef-a9f4b8f6133d"); // replace with your org ID
Project project = org.createProject(CreateProjectRequest.builder()
.name("Demo project using Java SDK")
.description("Your demo project")
.build());
// or you can get an existing project that you want to work on
// Project project = engine.getProject("ID_of_your_project") // replace with your own project ID
String dataFilePath = "examples/datasets/german-credit.xlsx";
List<Schema.Column> columns = new ArrayList<>();
columns.add(new Schema.Column("checking_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("duration", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("credit_history", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("purpose", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("credit_amount", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("savings_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("employment", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("installment_commitment", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("personal_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("other_parties", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("residence_since", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("property_magnitude", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("age", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("other_payment_plans", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("housing", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("existing_credits", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("job", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("num_dependents", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("own_telephone", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("foreign_worker", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("class", Schema.SemanticType.TEXT));
DataSource localDataSource = engine.buildFileSource(FileSourceRequest.builder()
.fileType(FileType.EXCEL)
.url(dataFilePath)
.fileSettings(new ExcelFileSettings("A1:U1001"))
.schema(new Schema(columns))
.build());
Dataset dataset = project.createDataset(CreateDatasetRequest.builder()
.name("German Credit Data (Excel)")
.dataSource(localDataSource)
.timeout(900)
.build());
System.out.println(dataset.getId());
}
}
Import data from Parquet files
from argparse import FileType
from aiaengine import Org, Project, FileSource, Column, FileType, DataType
# create a new demo project in the org
org = Org(id='b6240512-cd17-43a0-8297-84c51c1bc5a0') # replace with your org ID
project = org.create_project(name="Demo project using Python SDK", description="Your demo project")
# or you can get an existing project that you want to work on
# project = Project(id='ID_of_your_project') # replace with your own project ID
# import the `German Credit Data` dataset
data_file = 'examples/datasets/german-credit.parquet'
# You can use the `print_schema` utility function to print the auto-inferred schema
# print_schema(pd.read_parquet(data_file))
dataset = project.create_dataset(
name=f"German Credit Data (Parquet)",
data_source=FileSource(
file_urls=[data_file],
file_type=FileType.Parquet,
schema=[
Column('checking_status', DataType.Text),
Column('duration', DataType.Numeric),
Column('credit_history', DataType.Text),
Column('purpose', DataType.Text),
Column('credit_amount', DataType.Numeric),
Column('savings_status', DataType.Text),
Column('employment', DataType.Text),
Column('installment_commitment', DataType.Numeric),
Column('personal_status', DataType.Text),
Column('other_parties', DataType.Text),
Column('residence_since', DataType.Numeric),
Column('property_magnitude', DataType.Text),
Column('age', DataType.Numeric),
Column('other_payment_plans', DataType.Text),
Column('housing', DataType.Text),
Column('existing_credits', DataType.Numeric),
Column('job', DataType.Text),
Column('num_dependents', DataType.Numeric),
Column('own_telephone', DataType.Text),
Column('foreign_worker', DataType.Text),
Column('class', DataType.Text)
]
)
)
print(dataset.id)
package com.aiaengine.examples.dataset;
import com.aiaengine.Dataset;
import com.aiaengine.Engine;
import com.aiaengine.Org;
import com.aiaengine.Project;
import com.aiaengine.datasource.DataSource;
import com.aiaengine.datasource.EmptyFileSettings;
import com.aiaengine.datasource.Schema;
import com.aiaengine.datasource.file.FileSourceRequest;
import com.aiaengine.datasource.file.FileType;
import com.aiaengine.org.request.CreateProjectRequest;
import com.aiaengine.project.request.CreateDatasetRequest;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
public class ImportParquetApp {
public static void main(String[] args) throws FileNotFoundException {
Engine engine = new Engine();
// create a new demo project in the org
Org org = engine.getOrg("cae24b10-e6b0-4d61-8cef-a9f4b8f6133d"); // replace with your org ID
Project project = org.createProject(CreateProjectRequest.builder()
.name("Demo project using Java SDK")
.description("Your demo project")
.build());
// or you can get an existing project that you want to work on
// Project project = engine.getProject("ID_of_your_project") // replace with your own project ID
String dataFilePath = "examples/datasets/german-credit.parquet";
List<Schema.Column> columns = new ArrayList<>();
columns.add(new Schema.Column("checking_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("duration", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("credit_history", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("purpose", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("credit_amount", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("savings_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("employment", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("installment_commitment", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("personal_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("other_parties", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("residence_since", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("property_magnitude", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("age", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("other_payment_plans", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("housing", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("existing_credits", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("job", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("num_dependents", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("own_telephone", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("foreign_worker", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("class", Schema.SemanticType.TEXT));
DataSource localDataSource = engine.buildFileSource(FileSourceRequest.builder()
.fileType(FileType.PARQUET)
.url(dataFilePath)
.fileSettings(new EmptyFileSettings())
.schema(new Schema(columns))
.build());
Dataset dataset = project.createDataset(CreateDatasetRequest.builder()
.name("German Credit Data (Parquet)")
.dataSource(localDataSource)
.timeout(900)
.build());
System.out.println(dataset.getId());
}
}
Import data from JSON files
from argparse import FileType
from aiaengine import Org, Project, FileSource, Column, FileType, DataType
# create a new demo project in the org
org = Org(id='b6240512-cd17-43a0-8297-84c51c1bc5a0') # replace with your org ID
project = org.create_project(name="Demo project using Python SDK", description="Your demo project")
# or you can get an existing project that you want to work on
# project = Project(id='ID_of_your_project') # replace with your own project ID
# import the `German Credit Data` dataset
data_file = 'examples/datasets/german-credit.jsonl'
# You can use the `print_schema` utility function to print the auto-inferred schema
# print_schema(pd.read_json(data_file, orient='records', lines=True))
dataset = project.create_dataset(
name=f"German Credit Data (JSONL)",
data_source=FileSource(
file_urls=[data_file],
file_type=FileType.JSONLine,
schema=[
Column('checking_status', DataType.Text),
Column('duration', DataType.Numeric),
Column('credit_history', DataType.Text),
Column('purpose', DataType.Text),
Column('credit_amount', DataType.Numeric),
Column('savings_status', DataType.Text),
Column('employment', DataType.Text),
Column('installment_commitment', DataType.Numeric),
Column('personal_status', DataType.Text),
Column('other_parties', DataType.Text),
Column('residence_since', DataType.Numeric),
Column('property_magnitude', DataType.Text),
Column('age', DataType.Numeric),
Column('other_payment_plans', DataType.Text),
Column('housing', DataType.Text),
Column('existing_credits', DataType.Numeric),
Column('job', DataType.Text),
Column('num_dependents', DataType.Numeric),
Column('own_telephone', DataType.Text),
Column('foreign_worker', DataType.Text),
Column('class', DataType.Text)
]
)
)
print(dataset.id)
package com.aiaengine.examples.dataset;
import com.aiaengine.Dataset;
import com.aiaengine.Engine;
import com.aiaengine.Org;
import com.aiaengine.Project;
import com.aiaengine.datasource.DataSource;
import com.aiaengine.datasource.EmptyFileSettings;
import com.aiaengine.datasource.Schema;
import com.aiaengine.datasource.file.FileSourceRequest;
import com.aiaengine.datasource.file.FileType;
import com.aiaengine.org.request.CreateProjectRequest;
import com.aiaengine.project.request.CreateDatasetRequest;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
public class ImportJsonlineApp {
public static void main(String[] args) throws FileNotFoundException {
Engine engine = new Engine();
// create a new demo project in the org
Org org = engine.getOrg("cae24b10-e6b0-4d61-8cef-a9f4b8f6133d"); // replace with your org ID
Project project = org.createProject(CreateProjectRequest.builder()
.name("Demo project using Java SDK")
.description("Your demo project")
.build());
// or you can get an existing project that you want to work on
// Project project = engine.getProject("ID_of_your_project") // replace with your own project ID
String dataFilePath = "examples/datasets/german-credit.jsonl";
List<Schema.Column> columns = new ArrayList<>();
columns.add(new Schema.Column("checking_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("duration", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("credit_history", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("purpose", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("credit_amount", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("savings_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("employment", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("installment_commitment", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("personal_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("other_parties", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("residence_since", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("property_magnitude", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("age", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("other_payment_plans", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("housing", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("existing_credits", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("job", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("num_dependents", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("own_telephone", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("foreign_worker", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("class", Schema.SemanticType.TEXT));
DataSource localDataSource = engine.buildFileSource(FileSourceRequest.builder()
.fileType(FileType.JSON_LINE)
.url(dataFilePath)
.fileSettings(new EmptyFileSettings())
.schema(new Schema(columns))
.build());
Dataset dataset = project.createDataset(CreateDatasetRequest.builder()
.name("German Credit Data (JSONL)")
.dataSource(localDataSource)
.timeout(900)
.build());
System.out.println(dataset.getId());
}
}
Importing data from remote files
Import data from public HTTP
from aiaengine import Org, Project, FileSource, Column, DataType
# create a new demo project in the org
org = Org(id='b6240512-cd17-43a0-8297-84c51c1bc5a0') # replace with your org ID
project = org.create_project(name="Demo project using Python SDK", description="Your demo project")
# or you can get an existing project that you want to work on
# project = Project(id='ID_of_your_project') # replace with your own project ID
# import the `German Credit Data` dataset
data_file = 'https://docs.aiaengine.com/downloads/datasets/german-credit.csv'
# You can use the `print_schema` utility function to print the auto-inferred schema
# print_schema(pd.read_csv(data_file, header=0))
dataset = project.create_dataset(
name=f"German Credit Data (CSV - HTTP)",
data_source=FileSource(
file_urls=[data_file],
schema=[
Column('checking_status', DataType.Text),
Column('duration', DataType.Numeric),
Column('credit_history', DataType.Text),
Column('purpose', DataType.Text),
Column('credit_amount', DataType.Numeric),
Column('savings_status', DataType.Text),
Column('employment', DataType.Text),
Column('installment_commitment', DataType.Numeric),
Column('personal_status', DataType.Text),
Column('other_parties', DataType.Text),
Column('residence_since', DataType.Numeric),
Column('property_magnitude', DataType.Text),
Column('age', DataType.Numeric),
Column('other_payment_plans', DataType.Text),
Column('housing', DataType.Text),
Column('existing_credits', DataType.Numeric),
Column('job', DataType.Text),
Column('num_dependents', DataType.Numeric),
Column('own_telephone', DataType.Text),
Column('foreign_worker', DataType.Text),
Column('class', DataType.Text)
]
)
)
print(dataset.id)
package com.aiaengine.examples.dataset;
import com.aiaengine.Dataset;
import com.aiaengine.Engine;
import com.aiaengine.Org;
import com.aiaengine.Project;
import com.aiaengine.datasource.DataSource;
import com.aiaengine.datasource.Schema;
import com.aiaengine.datasource.file.CSVFileSettings;
import com.aiaengine.datasource.file.FileSourceRequest;
import com.aiaengine.datasource.file.FileType;
import com.aiaengine.org.request.CreateProjectRequest;
import com.aiaengine.project.request.CreateDatasetRequest;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
public class ImportCsvHttpApp {
public static void main(String[] args) throws FileNotFoundException {
Engine engine = new Engine();
// create a new demo project in the org
Org org = engine.getOrg("cae24b10-e6b0-4d61-8cef-a9f4b8f6133d"); // replace with your org ID
Project project = org.createProject(CreateProjectRequest.builder()
.name("Demo project using Java SDK")
.description("Your demo project")
.build());
// or you can get an existing project that you want to work on
// Project project = engine.getProject("ID_of_your_project") // replace with your own project ID
String dataFilePath = "https://docs.dev.aiaengine.com/downloads/datasets/german-credit.csv";
List<Schema.Column> columns = new ArrayList<>();
columns.add(new Schema.Column("checking_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("duration", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("credit_history", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("purpose", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("credit_amount", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("savings_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("employment", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("installment_commitment", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("personal_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("other_parties", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("residence_since", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("property_magnitude", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("age", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("other_payment_plans", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("housing", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("existing_credits", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("job", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("num_dependents", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("own_telephone", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("foreign_worker", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("class", Schema.SemanticType.TEXT));
DataSource localDataSource = engine.buildFileSource(FileSourceRequest.builder()
.fileType(FileType.CSV)
.url(dataFilePath)
.fileSettings(new CSVFileSettings())
.schema(new Schema(columns))
.build());
Dataset dataset = project.createDataset(CreateDatasetRequest.builder()
.name("German Credit Data (CSV - HTTP)")
.dataSource(localDataSource)
.timeout(900)
.build());
System.out.println(dataset.getId());
}
}
Import data from database systems
Import data from PostgreSQL, MySQL, SQL Server, and MongoDB
from aiaengine import Org, Project, DatabaseSource, DatabaseType, Column, DataType
# create a new demo project in the org
org_id = 'b6240512-cd17-43a0-8297-84c51c1bc5a0' # replace with your org ID
org = Org(org_id)
project = org.create_project(name="Demo project using Python SDK", description="Your demo project")
# or you can get an existing project that you want to work on
# project = Project(id='ID_of_your_project') # replace with your own project ID
dataset = project.create_dataset(
name=f"German Credit Data (PostgreSQL)",
data_source=DatabaseSource(
type=DatabaseType.PostgreSQL, # supported database types: PostgreSQL, MySQL, SQLServer, MongoDB
host='postgresql.default.svc',
port=5432,
username='postgres',
password='postgres',
database='postgres',
table='german_credit',
schema=[
Column('checking_status', DataType.Text),
Column('duration', DataType.Numeric),
Column('credit_history', DataType.Text),
Column('purpose', DataType.Text),
Column('credit_amount', DataType.Numeric),
Column('savings_status', DataType.Text),
Column('employment', DataType.Text),
Column('installment_commitment', DataType.Numeric),
Column('personal_status', DataType.Text),
Column('other_parties', DataType.Text),
Column('residence_since', DataType.Numeric),
Column('property_magnitude', DataType.Text),
Column('age', DataType.Numeric),
Column('other_payment_plans', DataType.Text),
Column('housing', DataType.Text),
Column('existing_credits', DataType.Numeric),
Column('job', DataType.Text),
Column('num_dependents', DataType.Numeric),
Column('own_telephone', DataType.Text),
Column('foreign_worker', DataType.Text),
Column('class', DataType.Text)
]
)
)
print(dataset.id)
package com.aiaengine.examples.dataset;
import com.aiaengine.Dataset;
import com.aiaengine.Engine;
import com.aiaengine.Org;
import com.aiaengine.Project;
import com.aiaengine.datasource.DataSource;
import com.aiaengine.datasource.Schema;
import com.aiaengine.datasource.database.DatabaseConnection;
import com.aiaengine.datasource.database.DatabaseType;
import com.aiaengine.project.request.CreateDatasetRequest;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
public class ImportPostgresApp {
public static void main(String[] args) throws FileNotFoundException {
Engine engine = new Engine();
// create a new demo project in the org
Org org = engine.getOrg("cae24b10-e6b0-4d61-8cef-a9f4b8f6133d"); // replace with your org ID
// Project project = org.createProject(CreateProjectRequest.builder()
// .name("Demo project using Java SDK")
// .description("Your demo project")
// .build());
// or you can get an existing project that you want to work on
// Project project = engine.getProject("ID_of_your_project") // replace with your own project ID
Project project = engine.getProject("403a448d-9d86-497f-a9f6-414afa72a415");
DatabaseConnection connection = DatabaseConnection.builder()
.type(DatabaseType.POSTGRES)
.host("postgresql.default.svc")
.port(5432)
.user("postgres")
.password("postgres")
.databaseName("postgres")
.table("german_credit")
.build();
List<Schema.Column> columns = new ArrayList<>();
columns.add(new Schema.Column("checking_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("duration", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("credit_history", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("purpose", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("credit_amount", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("savings_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("employment", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("installment_commitment", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("personal_status", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("other_parties", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("residence_since", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("property_magnitude", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("age", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("other_payment_plans", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("housing", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("existing_credits", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("job", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("num_dependents", Schema.SemanticType.NUMERIC));
columns.add(new Schema.Column("own_telephone", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("foreign_worker", Schema.SemanticType.TEXT));
columns.add(new Schema.Column("class", Schema.SemanticType.TEXT));
DataSource dbDataSource = engine.buildDatabaseSource(connection, new Schema(columns));
Dataset dataset = project.createDataset(CreateDatasetRequest.builder()
.name("German Credit Data (PostgreSQL)")
.dataSource(dbDataSource)
.timeout(900)
.build());
System.out.println(dataset.getId());
}
}