top of page
learn_data_science.jpg

Data Scientist Program

 

Free Online Data Science Training for Complete Beginners.
 


No prior coding knowledge required!

Writer's pictureEman Mahmoud

time series analysis

Time-Series-Analysis-of-NAICS TThe North American Industry Classification System (NAICS) is an industry classification system developed by the statistical agencies of Canada, Mexico, and the United States. NAICS is designed to provide common definitions of the industrial structure of the three countries and a common statistical framework to facilitate the analysis of the three economies. This analysis is a step by step analysis of the data, with a blog post found here https://www.datainsightonline.com/post/analysing-the-naics-time-series-data

Time-Series-Analysis-of-NAICS TThe North American Industry Classification System (NAICS) is an industry classification system developed by the statistical agencies of Canada, Mexico, and the United States. NAICS is designed to provide common definitions of the industrial structure of the three countries and a common statistical framework to facilitate the analysis of the three economies. This analysis is a step by step analysis of the data, with a blog post found here https://www.datainsightonline.com/post/analysing-the-naics-time-series-data

15 CSV files beginning with RTRA. These files contain employment data by

industry at different levels of aggregation; 2-digit NAICS, 3-digit NAICS, and 4-digit

NAICS. Columns mean as follows:

(i) SYEAR: Survey Year

(ii) SMTH: Survey Month

(iii) NAICS: Industry name and associated NAICS code in the bracket

(iv) _EMPLOYMENT_: Employment


LMO Detailed Industries by NAICS: An excel file for mapping the RTRA data to the

desired data. The first column of this file has a list of 59 industries that are frequently used.

The second column has their NAICS definitions. Using these NAICS definitions and RTRA

data, you would create a monthly employment data series from 1997 to 2018 for these 59

industries.


I will merge LMO Detailed Industries by NAICS file with 2-digit NAICS .

I will merge LMO Detailed Industries by NAICS file with 3-digit NAICS .

I will merge LMO Detailed Industries by NAICS file with 4-digit NAICS .

then

merge all with other .

First read all files and preprossing it for suitable for merging.

Load liberalies



import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import glob

files = glob.glob(r"C:/Users/21AK22/Documents/Data Insight/A_NEWLY_HIRED_DATA_ANALYST/*.csv")
data_2digit = pd.DataFrame()
data_3digit = pd.DataFrame()
data_4digit = pd.DataFrame()
for file in files:
   if re.search('_2NAICS', file):
       df = pd.read_csv(file)data_2digit = 
       pd.concat([data_2digit, df])
   elif re.search('_3NAICS',
       file):
       df = pd.read_csv(file)data_3digit = 
       pd.concat([data_3digit, df])
   elif re.search('_4NAICS', file):
      df =  pd.read_csv(file)data_4digit = 
      pd.concat([data_4digit, df])

I will use two function(separate_NAICS_code - Date_column) for preprossing data


def separate_NAICS_code(df):
df1=pd.DataFrame(df.NAICS.astype('str').str.split('[').to_list(), columns=['NAICS','NAICS_CODE'])
df1['NAICS_CODE']= df1.NAICS_CODE.astype('str').str.strip(']').str.replace('-',',')
df['NAICS']=df1['NAICS']
df['NAICS_CODE']= df1['NAICS_CODE']
return df

def Date_column(df):
df['date'] = pd.to_datetime(df.SYEAR.astype('str') + df.SMTH.astype('str'), format='%Y%m')
df = df.sort_values('date')
return df

preprossing data_2digit data and data_3digit

- Separate NAICS from thier code and put thier code in new column use separate_NAICS_code function.

- create date column using SYEAR and SYEAR use Date_column function.

preprossing data_4digit data only using Date_column function.


data_2digit.head(2)
data_3digit.head(2)
data_4digit.head(2)






Read and preprossing LMO_Detailed_Industries_by_NAICS file

- replace & in column NAICS with ,

- put type of column NAICS string


LMO_Detailed_Industries_by_NAICS = pd.read_excel(r"C:/Users/21AK22/Documents/Data Insight/A_NEWLY_HIRED_DATA_ANALYST/LMO_Detailed_Industries_by_NAICS.xlsx")
LMO_Detailed_Industries_by_NAICS['NAICS'] = LMO_Detailed_Industries_by_NAICS['NAICS'].replace(regex='&', value=',').astype('str')
LMO_Detailed_Industries_by_NAICS['NAICS'] = LMO_Detailed_Industries_by_NAICS['NAICS'].astype('str')
print(LMO_Detailed_Industries_by_NAICS.head())

splits all values in the NAIC column, that have a comma. We observe the following result


- left merging the data_2digit with lmo_detailed_industries

- left merging the data_3digit with lmo_detailed_industries

- merging the data_4digit with lmo_detailed_industries

then

Merging 3 dataframes

Result:



some visualization on final data

Employment in the Utilities industry 1997-2018

Number of Employment across industries 1997-2018




sourse code:https://github.com/eman888991/Data-Insight/blob/main/Project_Time_Series_Analysis_of_NAICS.ipynb

0 comments

Recent Posts

See All

Comments


bottom of page