Data Connection
ABetterchoice enables seamless data ingestion from your Data Warehouse, allowing you to transmit raw events and pre-computed metrics for tracking and experimental evaluation. We currently support data ingestion from these providers:
- BigQuery try it now
- GCS try it now
1. Definitions and Requirements of the Data Tables
1.1 Exposure Tables
The experiment exposure table is mainly used to record the mapping relationship between users and experiment IDs, and the calculation of experiment indicators will be completed based on this table. Update method = Daily increment You don't need to do anything! The experimental exposure data will be automatically reported to GCS through the ABC SDK, and the corresponding aggregation operation will be automatically completed. You can see the corresponding experimental exposure results T+1 days after the successful SDK report. !!#ff0000 Notes: by defaults, the root directory will create three folders, please do not modify or delete them, or the calculation will be influenced Bucket:
- abc_exp_expose/
- abc_exp_expose_hour/
- abc_exp_expose_realtime/!!
1.2 Assginment Tables
The assginment table stores the user's behavior data, such as the user click exposure data table。 Update method = Full update !!#ff0000 Notes:
- Types and names of ds、user_id need to follow the requirement shows above
- You can change the names and types of behavioural dimensions and measurement as you want (0-n coloumn is allowed for behavioural dimensions; 1-n coloumn is allowed for measurement )!!
2. Storage Requirements
ABC will access the data provided by you according to the specified partition name and file type. Therefore, please ensure your data adheres to the following requirements:
2.1 Data Partition Style
ABC uniformly adopts "ds" as the partition field, the naming rule of "ds" use the form of "YYYYMMDD", and the type of "ds" use Spark’s IntegerType. The case is as follows:
2.2 Data Format
We use parquet as files storage format. If you are not familiar with parquet, you can find out more information fromDocumentation | Apache Parquet. As a well-known storage format, there are a few usecases, go and have a look.