Skip to main content
Planned for development

Access to raw table metadata through API or SDK

Related products:API and SDKs
  • November 21, 2024
  • 1 reply
  • 31 views

Hi

For a dashboard use case we are working on I want to extract a list of the column names in each raw table we have in our staging area. At the moment, it does not seem to be a way of doing this. I have made two very hacky ways of accessing this information (see the code example below), but they are either very time consuming because of inferring the raw schema, or it does not return anything, because the table has two many columns and it times out. This makes this method unfeasible when running the scripts for our whole environment, which would need to happen regularly. I feel like there has to be a better way of doing this. I know raw is a schemaless service, but the columns do exist. Having this information would greatly improve our efforts in getting a better overview of our data.

from pydantic import BaseModel, Field

class RawTable(BaseModel):
database: str
table: str

def to_friendly_name(self) -> str:
return f"{self.database}.{self.table}"

def get_inferred_raw_schema(self, cognite_client) -> Dict[str, Any]:
schema = cognite_client.transformations.preview(
query=f"select * from `{self.database}`.`{self.table}` limit 100"
)
return schema.schema.dump()

def get_raw_schema_from_profiler(self, cognite_client) -> List[str]:
res = cognite_client.post(
url="/api/v1/projects/[INSERT_PROJECT]/profiler/raw",
json={"database": self.database, "table": self.table, "limit": 1000}
)
return list(res.json()["columns"].keys())

 

Thank you!

Sebastian

1 reply

Jørgen Lund
Seasoned Practitioner
Forum|alt.badge.img
  • Product Manager
  • January 3, 2025

Hi ​@sebastianheibo.akso, thanks for submitting this product idea.

We’re planning to support definition of table schemas for Raw, which will be optional and can be used by Transformations and your use case instead of the automatic schema inference that is default and used today.

A bit too soon to indicate a concrete timeline, but it will likely be after Q2 2025.