Skip to main content
Skip table of contents

Metavault Configuration

The component’s name of metaVault is DFAKTO_METAVAULT. You will need to prefix the environment variable with this value.

Authentication

This section is used to configure the Identity provider used to manage the users and log in to the different components of the solution. See Identity provider configuration to help configure an identity provider for beVault.

  • The default Scope is: “profile email OpenID”

AppData JSON configuration file
JSON
{
  "Authentication": {
      "Authority": "https://login.microsoftonline.com/[TENANTID]/v2.0",
      "ClientId": "[CLIENTID]",
      "Audience": "[CLIENTID]",
      "Scope":"[COMPLETE_SCOPE_API] profile email openid",
      "RequireHttpsMetadata": true
  }
}

The values in the example above are formatted for a configuration of Azure Active Directory

Environment variables
YAML
Authentication__Audience=[CLIENTID]
Authentication__Authority=https://login.microsoftonline.com/[TENANTID]/v2.0
Authentication__ClientId=[CLIENTID]
Authentication__Scope=[COMPLETE_SCOPE_API] profile email openid
Authentication__RequireHttpsMetadata=true

Don’t forget to prefix the variables with the component’s name

RequireHttpsMetadata defines the protocol used to communicate with the Oauth authentication provider (Keycloak). “True“, which is the default, requires https. Local deployments may want to set “false”.

Servers

In order to create an environment in beVault, and therefore a database to store your data vault, you will need to configure at least one server. You can add as many servers you need. This can be useful to have your testing environment on another server than your production one.

We recommend having all your environments on the same type of database to avoid issues with your custom SQL code (information mart, hard rules and data quality controls)

Supported target databases

Here are the currently supported databases where you can deploy your Data Vault.

Postgresql_elephant.svg-20240605-071144.png

PostgreSQL

Snowflake_Logo.svg-20240605-071233.png

Snowflake

microsoft-sql-server-logo-20240605-071318.svg

SQL Server

And also, in alpha: IbmDb2


Configuration

The configuration may differ a little depending on the type of database you target.

AppData JSON configuration file
JSON
{
  "Servers": {
    "PostgreSQL": {
      "Name": "",
      "allowCustomEnvironmentDatabaseName": false,
      "DatabaseType": "PostgreSQL",
      "Username": "",
      "Password": "",
      "ReadOnlyUsername": "",
      "ReadOnlyPassword": "",
      "ConnectionStringSuffix": "",
      "EngineParameters": {
        "FORCE_LOWERCASE": true
      }
    },
    "SQL Server": {
      "Name": "localhost",
      "allowCustomEnvironmentDatabaseName": false,
      "DatabaseType": "SQLServer",
      "Username": "",
      "Password": "",
      "ReadOnlyUsername":"",
      "ReadOnlyPassword":"",
      "ConnectionStringSuffix":"",
    },
    "Snowflake": {
      "Name": "<account_identifier>.snowflakecomputing.com",
      "allowCustomEnvironmentDatabaseName": false,
      "DatabaseType": "Snowflake",
      "Username": "MegaAdmin",
      "Password": "password",
      "ReadOnlyUsername":"",
      "ReadOnlyPassword":"",
      "ConnectionStringSuffix":"",
    },
    "IbmDb2": {
      "Name": "the.host.of.my.db.com",
      "Port": 50000,
      "allowCustomEnvironmentDatabaseName": true,
      "DatabaseType": "IbmDb2",
      "Username": "db2inst1",
      "Password": "password",
      "ReadOnlyUsername":"db2inst1",
      "ReadOnlyPassword":"password"
    },
  }
}
Environment variables
YAML
Servers__PostgreSQL__allowCustomEnvironmentDatabaseName=False
Servers__PostgreSQL__ConnectionStringSuffix=
Servers__PostgreSQL__DatabaseType=PostgreSQL
Servers__PostgreSQL__EngineParameters=
Servers__PostgreSQL__EngineParameters__FORCE_LOWERCASE=True
Servers__PostgreSQL__Name=
Servers__PostgreSQL__Password=
Servers__PostgreSQL__ReadOnlyPassword=
Servers__PostgreSQL__ReadOnlyUsername=
Servers__PostgreSQL__Username=

Servers__Snowflake__allowCustomEnvironmentDatabaseName=False
Servers__Snowflake__ConnectionStringSuffix=
Servers__Snowflake__DatabaseType=Snowflake
Servers__Snowflake__Name=<account_identifier>.snowflakecomputing.com
Servers__Snowflake__Password=password
Servers__Snowflake__ReadOnlyPassword=
Servers__Snowflake__ReadOnlyUsername=
Servers__Snowflake__Username=MegaAdmin

Servers__SQL Server__allowCustomEnvironmentDatabaseName=False
Servers__SQL Server__ConnectionStringSuffix=
Servers__SQL Server__DatabaseType=SQLServer
Servers__SQL Server__Name=localhost
Servers__SQL Server__Password=
Servers__SQL Server__ReadOnlyPassword=
Servers__SQL Server__ReadOnlyUsername=
Servers__SQL Server__Username=

SERVERS__Db2__NAME=localhost
SERVERS__Db2__PORT=50000
SERVERS__Db2__DATABASETYPE=IbmDb2
SERVERS__Db2__USERNAME=db2inst1
SERVERS__Db2__PASSWORD=password
SERVERS__Db2__READONLYUSERNAME=db2inst1
SERVERS__Db2__READONLYPASSWORD=password
SERVERS__Db2__AllowCustomEnvironmentDatabaseName=true

Don’t forget to prefix the variables with the component’s name

  • Name: the host where the DBMS resides, for example localhost

  • allowCustomEnvironmentDatabaseName : If the users have the right to choose a database name when creating an environment. If set to false, the database name will have the following format: project_environment

  • DatabaseType : The type of Database. Expected values :

    • PostgreSQL

      • for versions 12 → 14

    • PostgreSQL15

      • for versions 15 → …

    • SQLServer

    • Snowflake

  • Username : the name of the database user. This user will need to have the read and write access to the database, the right to create tables, schema, and databases (if the user has no right to create database you will have to create them manually).

  • Password : the password of the database user

  • ReadOnlyUsername : Username of a user with stg_reader, ref_reader, dv_reader Role and im_reader

  • ReadOnlyPassword : password of the read-only user

  • ConnectionStringSuffix : (1.6.8+) Additional parameters that will be appended to the connection string

  • EngineParameters: A list of parameters to pass to the Datavault Engine. This is specific to the database flavor. See below for details per db

Engine parameters

Postgresql

Parameter name

Expected value

Effect

FORCE_LOWERCASE

True/False

All identifiers to be generated and used for the database are changed to lowercase. This option is used to mimic the behavior of previous versions of beVault.

Note: If you migrate a beVault from version 2.X to version 3.0.0, you need to activate this option.

Snowflake

Parameter name

Expected value

Effect

DATAWAREHOUSE

Snowflake cloud database has the concept of “Warehouse” which is the entity doing the work when querying the database. This parameter specifies the warehouse to use. If not set, Snowflake selects the role’s default warehouse.

Step Functions

This section gives the necessary information for the orchestrator to connect to either AWS Step Function or, most likely, dFakto states.

 

AppData JSON configuration file
JSON
{
  "stepFunctions": {
    "authenticationKey": "",
    "authenticationSecret": "",
    "serviceUrl": "http://localhost:5500",
    "roleArn": "role",
    "awsRegion": "eu-west-1",
    "ignoreSelfSignedCertificates": false,
    "RegisterRetryDelay": 5,
    "TaskTimeoutSeconds": 10000,
    "HeartBeatTimeoutSeconds": 100,
    "DefaultHeartbeatDelay": 5,
    "DefaultMaxConcurrency" : 5,
    "EnvironmentName": ""
  }
}
Environment variables
YAML
stepFunctions__authenticationKey=
stepFunctions__authenticationSecret=
stepFunctions__awsRegion=eu-west-1
stepFunctions__DefaultHeartbeatDelay=5
stepFunctions__DefaultMaxConcurrency=5
stepFunctions__EnvironmentName=
stepFunctions__HeartBeatTimeoutSeconds=100
stepFunctions__ignoreSelfSignedCertificates=False
stepFunctions__RegisterRetryDelay=5
stepFunctions__roleArn=role
stepFunctions__serviceUrl=http://localhost:5500
stepFunctions__TaskTimeoutSeconds=10000

Don’t forget to prefix the variables with the component’s name

 

  • AuthenticationKey and AuthenticationSecret : allow to safely authenticate with the orchestrator. The values need to be retrieved from the orchestrator.

  • serviceUrl: url where the orchestrator is running (not required if running using AWS Step Functions)

  • roleArn and AWSRegion : if the orchestrator is states, can be left with the default value.

  • RegisterRetryDelay: Delay in seconds between two attempts of registering an activity.

  • TaskTimeoutSeconds: METAVAULT CONFIG - When the user pushes a new version of the data vault, state machine whose purpose is to load the data vault are generated and sent to the orchestrator. This field sets the maximum duration of each state of a state machine.

  • HeartbeatTimeoutSeconds: METAVAULT CONFIG - Same as the previous field, except that it sets how long the orchestrator will wait at most between heartbeat from the workers. Can most likely be left with the default value.

  • DefaultMaxConcurrency: WORKERS CONFIG - (1.5.1+) Maximum number of tasks processed at the same time for a given activity. Some Workers can have a hard-coded value and ignore the default configuration.

  • DefaultHeartbeatDelay: WORKERS CONFIG - (1.5.1+) Default delay in seconds between two Heartbeat sent to the server while processing a Task. Some Workers can have a hard-coded value and ignore the default configuration.

  • EnvironmentName: WORKERS CONFIG - For the other workers, can be left to any values. The value will be used to prefix the activities the worker connect to. For example, if the value chosen is “Prod”, the worker will connect to the Prod-gzip activity.

Logs

By default, all applications are sending reasonable logs to the console, the configuration can be updated using Serilog configuration section.

Here is the configuration of the logs. The most useful field to set is probably the path field, which sets where the logs will be stored on the disk.

For the other options :

  • MinimumLevel: Indicate the level of log we want to store. From low to high, these are VerboseDebugInformationWarningError and Fatal

  • rollOnfileSizeLimit: Indicate if we want to create a new log file when the current one reaches its size limit

  • fileSizeLimitByte: Indicate the size limit of a log file. Once this size is reached, a new file will be created if the rollOnfileSizeLimit is set to true

  • retainedFileCountLimit: Indicate how much file we should have, we start overriding the first log file.

  • option : formatter: The formatter decides the format of the logs (text, json, …)

 

For more option, see https://github.com/serilog/serilog-settings-configuration

AppData JSON configuration file
JSON
{
    "Serilog": {
        "MinimumLevel": {
            "Default": "Information",
            "Override": {
                "Microsoft": "Warning",
                "Microsoft.EntityFrameworkCore": "Warning",
                "System": "Warning"
            }
        },
        "WriteTo": [
            {
                "Name": "Console",
                "Args": {
                    "theme": "Serilog.Sinks.SystemConsole.Themes.AnsiConsoleTheme::Code, Serilog.Sinks.Console",
                    "outputTemplate": "[{Timestamp:yyyy-MM-dd HH:mm:ss.fff} {Level:u3}] {Message:lj} <s:{SourceContext}>{NewLine}{Exception}"
                }
            },
            { 
                "Name": "File",
                "Args":{ 
                    "path": "/var/log/testlog_.txt", 
                    "rollingInterval": "Day",
                    "fileSizeLimitBytes": 10000000,
                    "rollOnFileSizeLimit": true,
                    "retainedFileCountLimit": 10
                }
            }
        ]
    }
}
Environment variables
YAML
Serilog__MinimumLevel__Default=Information
Serilog__MinimumLevel__Override__Microsoft=Warning
Serilog__MinimumLevel__Override__System=Warning
Serilog__WriteTo__0__Args__outputTemplate=[{Timestamp:yyyy-MM-dd HH:mm:ss.fff} {Level:u3}] {Message:lj} <s:{SourceContext}>{NewLine}{Exception}
Serilog__WriteTo__0__Args__theme=Serilog.Sinks.SystemConsole.Themes.AnsiConsoleTheme::Code, Serilog.Sinks.Console
Serilog__WriteTo__0__Name=Console
Serilog__WriteTo__1__Args__fileSizeLimitBytes=10000000
Serilog__WriteTo__1__Args__path=/var/log/testlog_.txt
Serilog__WriteTo__1__Args__retainedFileCountLimit=10
Serilog__WriteTo__1__Args__rollingInterval=Day
Serilog__WriteTo__1__Args__rollOnFileSizeLimit=True
Serilog__WriteTo__1__Name=File

Don’t forget to prefix the variables with the component’s name

Git

CODE
{
  "git": {
    "EndOfLine": "\n"
  }
}

The git section of the config contains only one field, the EndOfLine field : This field can most likely be left untouched. It can be changed if a .git folder is copied from a Unix system to a Windows system or vice versa, where the end of line character is encoded differently.

Query

AppData JSON configuration file
JSON
  {
    "queryConfig": {
      "deploymentTimeoutSeconds": 30,
      "defaultWorkerTimeoutSeconds": 7200,
      "migrationTimeoutSeconds": 600
    }
  }

The values configured here are redundant, because they use the default timeout values.

Environment variables
CODE
queryConfig__deploymentTimeoutSeconds=30
queryConfig__defaultWorkerTimeoutSeconds=7200
queryConfig__migrationTimeoutSeconds=600

Don’t forget to prefix the variables with the component’s name. (i.e: DFAKTO_METAVAULT_)

The values configured here are redundant, because they use the default timeout values.

  • deploymentTimeoutSeconds: The number of seconds a single query can run against the database during (or during the analysis of) the deployment. Note that a deployment typically executes hundreds of small queries, so this should be pretty small. Defaults to 30 (seconds).

  • defaultWorkerTimeoutSeconds: The number of seconds, by default, a single query can run against the database during the execution of the Metavault workers. (BulkImportExport or DatavaultQuery). Defaults to 7200 (seconds, i.e. 2 hours). A single workflow step in the orchestrator may override this timeout using the worker parameter: TimeoutSeconds

  • migrationTimeoutSeconds: The number of seconds a single query can run against the database during a migration. This was especially relevant for the 3.0 migration. Defaults to 600 (seconds).

Other

Web server config

AppData JSON configuration file
JSON
{
  "Hsts": {
        "MaxAge": "00.00:05:00",
        "Preload": true
    }
}
Environment variables
YAML
Hsts__MaxAge=00.00:05:00
Hsts__Preload=True

Don’t forget to prefix the variables with the component’s name

 

HTTP Strict Transport Security (HSTS) is a simple and widely supported standard to protect visitors by ensuring that their browsers always connect to a website over HTTPS.

  • MaxAge: The time, that the browser should remember that a site is only to be accessed using HTTPS.

  • Preload: it is possible to enforce secure connections on a higher level, even before visiting a website for the first time: the HSTS preload list. This is a list, managed by google, with domain names that by default support HSTS: 

AppData JSON configuration file
JSON
{
  "Urls": "http://localhost:5000",
  "ForwardedHeadersOptions": {
      "ForwardedHeaders": "All"
  }
}
Environment variables
YAML
ForwardedHeadersOptions__ForwardedHeaders=All
Urls=http://localhost:5000

Don’t forget to prefix the variables with the component’s name

 

The ForwardedHeadersOptions set the behavior of proxied headers onto the requests. You can most likely leave it to the default All value.

The accepted values are :

  • All : Process X-Forwarded-For, X-Forwarded-Host and X-Forwarded-Proto.

  • None : Do not process any forwarders

  • XForwardedFor : Process X-Forwarded-For, which identifies the originating IP address of the client.

  • XForwardedHost : Process X-Forwarded-Host, which identifies the original host requested by the client.

  • XForwardedProto : Process X-Forwarded-Proto, which identifies the protocol (HTTP or HTTPS) the client used to connect.

Fore more information, see https://docs.microsoft.com/en-us/aspnet/core/host-and-deploy/proxy-load-balancer?view=aspnetcore-5.0

Prometheus integration

Prometheus (https://prometheus.io/)  is an open-source systems monitoring and alerting

 

  • DefaultContextLabel: Metrics recorded are grouped into “Contexts”, for example a database context or application context. Metrics names should be unique per context. The default is “Application”.

  • Enabled: Allows recording of all metrics to be enabled/disabled, default is true.

  • ApdexTrackingEnabled: Allows enabling/disabling of calculating the apdex score on the overall responses times. Defaults to true. The Apdex (Application Performance Index) is used to monitor end-user satisfaction. It is an open industry standard that estimates the end user’s satisfaction level on an application’s response time through a score between 0 and 1.

  • apdexTSeconds: The Apdex T seconds value used in calculating the score on the samples collected.

  • IgnoredHttpStatusCode: Allows specific HTTP status codes to be ignored when reporting on response related information, e.g., You might not want to monitor 404 status codes.

  • IngoredRoutesRegexPatterns: An list of regex patterns used to ignore matching routes from metrics tracking.

  • Oauth2TrackingEnabled: Allows recording of all OAuth2 Client tracking to be enabled/disabled. Defaults to true.

  • MetricsEndPointEnabled: Allows enabling/disabling of the /metrics endpoint, when disabled will result in a 404 status code, the default is true.

  • MetrucsTextEndpointEnabled: Allows enabling/disabling of the /metrics-text endpoint, when disabled will result in a 404 status code, the default is true.

  • EnvironmentInfoEndpointEnabled: Allows enabling/disabling of the /env endpoint, when disabled will result in a 404 status code, the default is true.

AppData JSON configuration file
JSON
{
    "MetricsOptions": {
        "DefaultContextLabel": "dFakto States",
        "Enabled": true
    },
    "MetricsWebTrackingOptions": {
        "ApdexTrackingEnabled": true,
        "ApdexTSeconds": 0.1,
        "IgnoredHttpStatusCodes": [ 404 ],
        "IgnoredRoutesRegexPatterns": [],
        "OAuth2TrackingEnabled": true
    },
    "MetricEndpointsOptions": {
        "MetricsEndpointEnabled": true,
        "MetricsTextEndpointEnabled": false,
        "EnvironmentInfoEndpointEnabled": true
    }
}
Environment variables
YAML
MetricEndpointsOptions__EnvironmentInfoEndpointEnabled=True
MetricEndpointsOptions__MetricsEndpointEnabled=True
MetricEndpointsOptions__MetricsTextEndpointEnabled=False
MetricsOptions__DefaultContextLabel=dFakto States
MetricsOptions__Enabled=True
MetricsWebTrackingOptions__ApdexTrackingEnabled=True
MetricsWebTrackingOptions__ApdexTSeconds=0.1
MetricsWebTrackingOptions__IgnoredHttpStatusCodes__0=404
MetricsWebTrackingOptions__OAuth2TrackingEnabled=True

Don’t forget to prefix the variables with the component’s name

Sentry

Sentry (https://sentry.io/) is an application monitoring platform.

  • DSN: where to send events, so the events are associated with the correct project.

  • IncludeRequestPayload: whether we should send the request body to Sentry. This is done so that the request data can be read at a later point in case an error happens while processing the request.

  • SendDefaultPii: Whether we should report the user who made the request

  • MinimumBreadcrumbLevel: Configure the lowest level a message has to be to become a breadcrumb. Breadcrumbs are the last (by default 100) log that were sent before the event was fired to Sentry.

  • MinimumEventLevel: A LogLevel which indicates the minimum level a log message has to be sent to Sentry as an event. By default, this value is Error.

  • AttachStackTrace: Configures whether Sentry should generate and attach stack traces to capture message calls.

  • Debug: Turns debug mode on or off. If debug is enabled, Sentry will attempt to print out useful debugging information if something goes wrong with sending the event. The default is always false. It's generally not recommended to turn it on in production, though turning debug mode on will not cause any safety concerns.

  • DiagnosticsLevel: Debug by default.

  • DefaultTags: Defaults tags to add to all events.

AppData JSON configuration file
JSON
{
  "Sentry": {
      "Dsn": "",
      "IncludeRequestPayload": true,
      "SendDefaultPii": true,
      "MinimumBreadcrumbLevel": "Debug",
      "MinimumEventLevel": "Warning",
      "AttachStackTrace": true,
      "Debug": true,
      "DiagnosticsLevel": "Error",
      "DefaultTags": {
          "client": "XXX"
      }
  }
}
Environment variables
YAML
Sentry__AttachStackTrace=True
Sentry__Debug=True
Sentry__DefaultTags__client=XXX
Sentry__DiagnosticsLevel=Error
Sentry__Dsn=
Sentry__IncludeRequestPayload=True
Sentry__MinimumBreadcrumbLevel=Debug
Sentry__MinimumEventLevel=Warning
Sentry__SendDefaultPii=True

Don’t forget to prefix the variables with the component’s name

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.