How insert time series dapoint works

Question

I am doing some hands-on exercises, but I am stuck in a point here.

The task is to Add some time series data:

- Create a time series object for each country asset in CDF called <country>_population and associate it with its corresponding country asset.

Remember to associate the data as well to the data set that you created.
As an example, the time series for Aruba would be called Aruba_population.

- Load the data from populations_postprocessed.csv into a pandas dataframe.
- Insert the data for each country in this dataframe using client.time_series.data.insert_dataframe.
- As a check, retrieve the latest population data for the countries of Latvia, Guatemala, and Benin.
- Calculate the total population of the Europe region using the Asset Hierarchy and the time series data.

The point I stuck on:

=> “Insert the data for each country in this dataframe using client.time_series.data.insert_dataframe”.

I am not sure what does this means, although I had done an early exercise …

“Insert the population data as a dataframe”:
client.time_series.data.insert_dataframe(df, external_id_headers=False)

Could you explain me how all of this works, please?

Sigurd Holsen · Answer

Hi! Sorry for the late reply.I am not familiar with the exercise that you are doing, but I can try to explain how the `insert_dataframe` function behaves. It accepts a pandas data frame as the first argument. To create a pandas data frame from a csv file, you do the following:import pandas as pddf = pd.read_csv("population.csv")To insert a data frame into CDF as a time series, the index must be a date time index, and the columns must be the external ids of time series that already exist.ExampleLet’s say that `population.csv` contains this:      Country Name Country Code  Year     Value0            Aruba          ABW  1960     546081            Aruba          ABW  1961     558112            Aruba          ABW  1962     566823            Aruba          ABW  1963     574754            Aruba          ABW  1964     58178...            ...          ...   ...       ...16395     Zimbabwe          ZWE  2017  1475110116396     Zimbabwe          ZWE  2018  1505218416397     Zimbabwe          ZWE  2019  1535460816398     Zimbabwe          ZWE  2020  1566966616399     Zimbabwe          ZWE  2021  15993524[16400 rows x 4 columns]If we want to create a time series for that shows the population in Norway, we can do the following:df = pd.read_csv("population.csv")# Extract only the rows for Norwaydf = df[df["Country Name"] == "Norway"]# Convert the index to a date time index. We read the time from the Year column.df.index = pd.to_datetime(df['Year'], format='%Y')# Create a time seriesclient.time_series.create(    TimeSeries(external_id="population_in_norway"))# Rename the column to the external id of our df["population_in_norway"] = df["Value"]# Remove the other columnsdf = df[["population_in_norway"]]# Insert the rows as data pointsclient.time_series.data.insert_dataframe(df)# You can fetch the same data as a data frame like this:client.time_series.data.retrieve_dataframe(    external_id="population_in_norway")EDIT: There is more information on the python SDK website:https://cognite-sdk-python.readthedocs-hosted.com/en/latest/time_series.html#insert-pandas-dataframe

Example

Reply

Sign up

Log in to the community

Scanning file for viruses.

This file cannot be downloaded