Skip to main content
Question

How insert time series dapoint works

  • 6 September 2024
  • 1 reply
  • 42 views

adrianosvieira
Active
Forum|alt.badge.img

I am doing some hands-on exercises, but I am stuck in a point here.

The task is to Add some time series data:

- Create a time series object for each country asset in CDF called <country>_population and associate it with its corresponding country asset.

  •   Remember to associate the data as well to the data set that you created.
  •   As an example, the time series for Aruba would be called Aruba_population.

- Load the data from populations_postprocessed.csv into a pandas dataframe.
- Insert the data for each country in this dataframe using client.time_series.data.insert_dataframe.
- As a check, retrieve the latest population data for the countries of Latvia, Guatemala, and Benin.
- Calculate the total population of the Europe region using the Asset Hierarchy and the time series data.

The point I stuck on:

=> “Insert the data for each country in this dataframe using client.time_series.data.insert_dataframe”.

 

I am not sure what does this means, although I had done an early exercise …

Insert the population data as a dataframe:
client.time_series.data.insert_dataframe(df, external_id_headers=False)

 

Could you explain me how all of this works, please?

Hi! Sorry for the late reply.

I am not familiar with the exercise that you are doing, but I can try to explain how the `insert_dataframe` function behaves. It accepts a pandas data frame as the first argument. To create a pandas data frame from a csv file, you do the following:

import pandas as pd

df = pd.read_csv("population.csv")

To insert a data frame into CDF as a time series, the index must be a date time index, and the columns must be the external ids of time series that already exist.

Example

Let’s say that `population.csv` contains this:

      Country Name Country Code  Year     Value
0            Aruba          ABW  1960     54608
1            Aruba          ABW  1961     55811
2            Aruba          ABW  1962     56682
3            Aruba          ABW  1963     57475
4            Aruba          ABW  1964     58178
...            ...          ...   ...       ...
16395     Zimbabwe          ZWE  2017  14751101
16396     Zimbabwe          ZWE  2018  15052184
16397     Zimbabwe          ZWE  2019  15354608
16398     Zimbabwe          ZWE  2020  15669666
16399     Zimbabwe          ZWE  2021  15993524

[16400 rows x 4 columns]

If we want to create a time series for that shows the population in Norway, we can do the following:

df = pd.read_csv("population.csv")
# Extract only the rows for Norway
df = df[df["Country Name"] == "Norway"]

# Convert the index to a date time index. We read the time from the Year column.
df.index = pd.to_datetime(df['Year'], format='%Y')

# Create a time series
client.time_series.create(
    TimeSeries(external_id="population_in_norway")
)

# Rename the column to the external id of our 
df["population_in_norway"] = df["Value"]

# Remove the other columns
df = df[["population_in_norway"]]

# Insert the rows as data points
client.time_series.data.insert_dataframe(df)

# You can fetch the same data as a data frame like this:
client.time_series.data.retrieve_dataframe(
    external_id="population_in_norway"
)

EDIT: There is more information on the python SDK website: https://cognite-sdk-python.readthedocs-hosted.com/en/latest/time_series.html#insert-pandas-dataframe


Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings