Posts

Interview Questions Accenture (Pune)

Tell me about things you did in your project related to GCP? Legacy SQL vs Standard SQL What details we are getting from Execution Details Tab? How you will reduce query cost? Any impact of LIMIT on query cost On which type of columns we can do  PARTITIONING on Tables? What MERGE does in BigQuery? Suppose there are 2 columns in table. Column1 of TIMESTAMP, Column2 of STRING but has timestamp values. How will you calculate average of difference between these 2 columns? JSON file is on GCS. Load JSON data into a table using bq command line tool. There is CSV file present on GCS. Schema of CSV and Table is matching but There is error while loading csv file to Table? What will be the Possible Errors (Expected UTF8 vs UNICODE characters) What is SAFE_CAST()? Suppose you have a array of 100 elemnts. How will you get value of 98 element? (Expecting to use Array Functions, OFFSET or CARDINAL) What is RANK() and DENSE_RANK()? What is FIRST_VALUE() and LAST_VALUE()? What is LEAD() and LAG()? Wha

BigQuery Execution Details

Image
  How to use BigQuery execution plans to optimize query performance BigQuery breaks down your SQL into a series of execution steps (stages) and creates a query execution tree. The query execution tree consists of a series of leaf nodes and the master node. The execution plan is an ideal tool to understand how the query is turned into series of work units and it provides valuable insights on where the query is spending so much time. Finding the execution plan for your query To find the execution plan, Once you execute the query in the WebUI, Navigate to the the preview panel (The area right below the query editor), And click on the fourth tab "Execution details" to see the query plan To check the execution plan for older queries, you can navigate and find the query that you ran in the past using the "Query History" window and once you find the query, click on query to expand, and then click on "Open query in editor", Now you should see the "Execution d

CLUSTER BY

Image
  Within a partition, Bigquery uses the cluster keys to sort the data. You can specify up to four columns as part of the cluster by clause. It is recommended to use the columns most likely used in the filter conditions as part of the cluster by clause

PARTITION BY

  "Partition by" as the name specifies, splits the table into multiple partitions. You can specify only one column as part of the PARTITION by clause. Its recommended to use the columns that are most likely to be used as part of the filter conditions as the partition key. Partition key helps with data pruning, reducing the amount of data scanned by the queries thereby reducing the overall cost. As of this writing, Bigquery only supports DATE, Timestamp and Numeric columns to be used as a Partition by column

Accessing BigQuery

  Accessing BigQuery  You can access BigQuery and run jobs from your web browser  Developers can use bq command line tool python-based tool that can access BigQuery from the command line  Developers can also leverage the Service API  RESTful API to access BigQuery programmatically  Requires authorization by OAuth2  Google client libraries for Python, JavaScript, PHP, etc.  Integration Possible with Third party Tools  Visualization and Statistical Tools tools like Tableau, QlikView, R, etc.  You can export data in a .csv file, jason or to Google Cloud Storage There are three main ways you interact with BigQuery: Loading and exporting data Querying and viewing data Managing data To perform these interactions, you can use the following: The  Cloud Console The  bq  command-line tool The BigQuery  REST API  or  client libraries Loading and exporting data In most cases, you  load data into BigQuery storage . If you want to get the data back out of BigQuery, you can  export the data . Alterna

BigQuery Organisation

Image
BigQuery is structured as a hierarchy with 4 levels:  Projects: Top-level containers in the Google Cloud Platform that store the data  Datasets: Within projects, datasets hold one or more tables of data  Tables: Within datasets, tables are row-column structures that hold actual data  Jobs: The tasks you are performing on the data, such as running queries, loading data, and exporting data Projects  Projects are the top-level containers that store the data  Within the project, you can configure settings, permissions, and other metadata that describe your applications  Each project has a name, ID, and number that you’ll use as identifiers  When billing is enabled, each project is associated with one billing account but multiple projects can be billed to the same account Datasets  Datasets allow you to organize and control access to your tables  All tables must belong to a dataset. You must create a dataset before loading data into BigQuery  You can configure permissions at the organizatio

BigQuery Time Travel (Snapshot decorator)

  BigQuery Time Travel and Undelete using Snapshot decorator What is Time Travel in BigQuery With Time travel, You can access the data in the table from any point (eg. any particular second) within the last seven days, even if the data has been deleted. You can use time travel to query data that was updated or deleted, restore a table that was deleted, or restore a table that expired. Better yet, there is no additional cost to enable this feature and there are no maintenance overheads as well How to query a table using Time travel in BigQuery If you have to take a look at the data in a table as of any particular time, all you have to do is add the AS OF Suffix (also known as snapshot decorator) SELECT * FROM table FOR SYSTEM_TIME AS OF '2020-01-01 12:00:00-07:00'; You can also use relative time as shown below: SELECT * FROM `mydataset.mytable` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR); with Time Travel you can query up to 7 days in the past H