29 Matching Annotations
  1. Dec 2023
    1. Technique #2: Sampling

      How do you load only a subset of the rows?

      When you load your data, you can specify a skiprows function that will randomly decide whether to load that row or not:


      from random import random

      def sample(row_number): ... if row_number == 0: ... # Never drop the row with column names: ... return False ... # random() returns uniform numbers between 0 and 1: ... return random() > 0.001 ... sampled = pd.read_csv("/tmp/voting.csv", skiprows=sample) len(sampled) 973 ```

    2. lossy compression: drop some of your data in a way that doesn’t impact your final results too much.

      If parts of your data don’t impact your analysis, no need to waste memory keeping extraneous details around.

  2. Jul 2023
    1. The parameter by specifies the columns, and ascending takes a list to define the sorting direction per each column. In this case, we're sorting by Country name in descending order first (in lexicographical order), and by number of Employees in ascending order second.

      Pandas DataFrame allows for multiple sorting

    1. pd.read_csv

      با پانداس میاد میخونه

    2. df.head()

      خیلی راحت head می خونه.

    3. X= df['Head Size(cm^3)']y=df['Brain Weight(grams)']

      معرفی کرد Feature و Label خودشا

    1. Bollinger bands are just a simple visualization/analysis technique that creates two bands, one "roof" and one "floor" of some "support" for a given time series. The reasoning is that, if the time series is "below" the "floor", it's a historic low, and if it's "above" the "roof", it's a historic high. In terms of stock prices and other financial instruments, when the price crosses a band, it's said to be too cheap or too expensive.

      How to display Bollinger bands with Pandas.

  3. May 2023
    1. Panda

      با استفاده از این تابع میشه برای ستون های عددی مقدار Count و Avg و غیره را بدست آورد

    1. Panda

      تعداد ردیف و ستون اون Data Frame را برمیگردونه.

    1. Return the first 5 rows of the DataFrame

      5 تا ردیف اول را برات بر میگردونه. یه ورودی هم شاید بگیره که در واقع تعداد ردیف هایی است که میخواد برگردونه

    1. Pandas is a Python library.

      یکی از کتاب خونه های خوبه Python.

  4. Apr 2023
    1. ff = ef['x','y']

      Máscaras em Pandas são uma maneira de selecionar um subconjunto de dados de um DataFrame, Series ou outro objeto de dados baseado em uma condição booleana.

      O código que deve ser adicionado no lugar de # a fazer é:

      ff = ef[['x', 'y']]

      Isso irá selecionar apenas as colunas 'x' e 'y' do DataFrame ef, que é o resultado da máscara m. A máscara m seleciona apenas as linhas onde o valor da coluna 'z' é False, e então, ef contém apenas essas linhas. Finalmente, ff é criado selecionando as colunas 'x' e 'y' do DataFrame ef.

  5. Dec 2021
  6. Nov 2021
  7. Sep 2021
  8. Aug 2020
  9. Mar 2020
    1. It’s just that it often makes sense to write code in the order JOIN / WHERE / GROUP BY / HAVING. (I’ll often put a WHERE first to improve performance though, and I think most database engines will also do a WHERE first in practice)

      Pandas usually writes code in this syntax:

      1. JOIN
      2. WHERE
      3. GROUP BY
      4. HAVING


      1. df = thing1.join(thing2) # like a JOIN
      2. df = df[df.created_at > 1000] # like a WHERE
      3. df = df.groupby('something', num_yes = ('yes', 'sum')) # like a GROUP BY
      4. df = df[df.num_yes > 2] # like a HAVING, filtering on the result of a GROUP BY
      5. df = df[['num_yes', 'something1', 'something']] # pick the columns I want to display, like a SELECT
      6. df.sort_values('sometthing', ascending=True)[:30] # ORDER BY and LIMIT
      7. df[:30]
  10. Nov 2019
  11. Oct 2019
    1. Indicate number of NA values placed in non-numeric columns.

      This is only true when using the Python parsing engine.

      Filled 3 NA values in column name

      If using the C parsing engine you get something like the following output:

      Tokenization took: 0.01 ms
      Type conversion took: 0.70 ms
      Parser memory cleanup took: 0.01 ms
  12. Feb 2019
  13. Jun 2018
  14. May 2018
  15. Apr 2018
  16. Mar 2018
    1. I'll skip the inefficient method I used before with the custom groupby aggregationm, and go for some neat trick using the mighty transform method.

      a more constrained. and thus more efficient way to do transformations on groupbys than the apply method. You can do very cool stuff with it. For those of you who know splunk - this has the neat "streamstats" and "eventstats" capabilities

  17. Dec 2017